> ## Documentation Index
> Fetch the complete documentation index at: https://docs.sutro.sh/llms.txt
> Use this file to discover all available pages before exploring further.

# Large Scale Embedding Generation with Qwen3 0.6B

> Easily (and inexpensively) create a semantic search index of over 4M document chunks from Apple's patent literature, using Sutro

<div class="article-badges">
  <span class="badge badge-time"><Icon icon="clock" /> 10 min read</span>
  <span class="badge badge-implementation"><Icon icon="wrench" /> \~1-2 hour project</span>
  <span class="badge badge-cost"><Icon icon="dollar-sign" /> \~15</span>
  <span class="badge badge-level"><Icon icon="graduation-cap" /> Beginner</span>
</div>

## Overview

In this example, we're going to demonstrate how to easily embed over 4M document chunks to create a searchable index. The documents we'll be embedding is the entire corpus of Apple's patent literature. By the end of this guide we'll be able to search for things like "wireless charging technology", "biometric authentication methods", or even complex queries like "patents related to reducing battery consumption in mobile devices" - and get relevant patent results in milliseconds.

### Why Embeddings and Vector Search Matter

Traditional keyword search breaks down when users don't know the exact terminology - searching for "making phones last longer" won't find documents about "battery optimization." Embeddings solve this by converting text into vectors that better capture semantic meaning, enabling search that actually understands intent rather than just matching strings. What once required teams of search quality experts can now be implemented in an afternoon for around \$15, as we'll demonstrate by making 30,000 Apple patents (split into over 4M+ document chunks) semantically searchable.

## Data Source

The source for the corpus of patent documents will be Google BigQuery, where we can query for the full text and all relevant metadata with the below query. The results for this query (the base for our embeddings) can be found at this [HuggingFace Dataset](https://huggingface.co/datasets/sutro/apple-patents-bigquery)

<Accordion title="View Full BigQuery Query">
  ```sql theme={null}
  SELECT DISTINCT
      p.publication_number,
      p.application_number,
      p.country_code,
      p.kind_code,
      title.text as patent_title,
      abstract.text as patent_abstract,
      claims.text as patent_claims,
      description.text as patent_description,
      p.filing_date,
      p.publication_date,
      p.grant_date,
      p.priority_date,
      p.inventor,
      assignee.name as assignee_name,
      p.cpc,
      p.family_id,
  FROM
      `patents-public-data.patents.publications` AS p,
      UNNEST(title_localized) AS title,
      UNNEST(abstract_localized) AS abstract,
      UNNEST(claims_localized) AS claims,
      UNNEST(description_localized) AS description,
      UNNEST(assignee_harmonized) as assignee
  WHERE
      LOWER(assignee.name) LIKE '%apple inc%'
      AND p.kind_code IN ('B1', 'B2')
      AND p.country_code = 'US'
      AND p.grant_date IS NOT NULL
      AND title.language = 'en'
      AND abstract.language = 'en'
      AND claims.language = 'en'
      AND description.language = 'en'
  ORDER BY
      p.grant_date DESC
  ```
</Accordion>

We'll store the results from that query into a Polars `DataFrame` using the below snippet. Polars is great for efficiently manipulating large datasets.

```python theme={null}
# Authenticate
credentials = service_account.Credentials.from_service_account_info(
    json.loads(os.environ['SERVICE_ACCOUNT_JSON'])
)
project_id = 'xxxxx'
client = bigquery.Client(project=project_id, credentials=credentials)

# Run query
query_job = client.query(query_full_text)
results = query_job.result()
patents_df = pl.from_arrow(results.to_arrow())
```

## Document Chunking

After retrieving and storing the patent results, we'll need to chunk all the relevant sections into smaller sections of text.

### Why Chunking is Essential

Chunking is essential when working with embeddings mainly because of **semantic precision**, essentially meaning it is important for the reader of the results to be able to interpret them with the correct contextual meaning:

Smaller chunks allow for more precise retrieval of relevant information, however there is a critical balance to strike:

**Too small chunks (e.g., 50-100 tokens):**

* ✅ High precision - returns exactly what matches
* ❌ Loss of contextual meaning - "battery life" might miss "wasn't that good" in the next sentence
* ❌ Fragmented results - you might need to retrieve multiple chunks to get a complete idea
* ❌ More vectors to store and search (higher costs)

**Too large chunks (e.g., 2000+ tokens):**

* ✅ Rich context preserved - full patent claims or entire technical descriptions
* ✅ Fewer vectors to manage (lower costs)
* ❌ Diluted relevance - a chunk about "display technology" might rank poorly for "OLED" even if it contains relevant OLED information buried within
* ❌ Multiple concepts per chunk - retrieval becomes less precise

**The sweet spot (typically 200-800 tokens) depends on:**

* Your content type (patent claims are self-contained; descriptions are narrative)
* Search intent (looking for specific facts vs. understanding concepts)
* Embedding model characteristics (some models better preserve semantics in longer sequences)

### Chunking Strategy

For this example, we're using a simple fixed-size chunking strategy with overlap, but there are many approaches to consider:

* **Fixed-size chunking**: Simple and predictable, splits text every N characters/tokens
* **Semantic chunking**: Uses NLP to find natural boundaries (sentences, paragraphs, sections)
* **Recursive chunking**: Hierarchically splits documents while preserving structure
* **Corpus-specific chunking**: For patents, you might chunk by claims, abstract, description sections; for code, you might designate chunks by function boundaries

You can view the implementation of our chunking strategy in the collapsable below. We chose to write this ourselves for simplicity, but we know teams like [Unstructured](https://unstructured.io/) also do a great job here!

<Accordion title="View Chunking Helper Functions">
  ```python theme={null}
  def chunk_text(text: str, chunk_size: int = 2000, overlap: int = 400) -> List[str]:
      """Split text into overlapping chunks for embedding"""
      if not text:
          return []

      chunks = []
      start = 0
      text_length = len(text)

      while start < text_length:
          end = min(start + chunk_size, text_length)
          chunk = text[start:end]
          chunks.append(chunk)
          start += chunk_size - overlap

      return chunks

  def prepare_patent_for_embedding(row: Dict) -> List[Dict]:
      """Convert patent row to embedding-ready documents"""
      patent_id = row['publication_number']
      base_metadata = {
          'patent_id': patent_id,
          'title': row['patent_title'],
          'assignee': row['assignee_name'],
          'grant_date': str(row['grant_date']),
          'inventors': row['inventor'],
          'cpc_codes': row['cpc'],
          'family_id': row['family_id'],
      }

      documents = []

      # Title as separate document
      documents.append({
          'doc_id': hashlib.md5(f"{patent_id}_title".encode()).hexdigest(),
          'patent_id': patent_id,
          'section': 'title',
          'text': row['patent_title'],
          'metadata': base_metadata
      })

      # Abstract as separate document
      if row['patent_abstract']:
          documents.append({
              'doc_id': hashlib.md5(f"{patent_id}_abstract".encode()).hexdigest(),
              'patent_id': patent_id,
              'section': 'abstract',
              'text': row['patent_abstract'],
              'metadata': base_metadata
          })

      # Claims - chunk if long
      if row['patent_claims']:
          claims_chunks = chunk_text(row['patent_claims'])
          for i, chunk in enumerate(claims_chunks):
              documents.append({
                  'doc_id': hashlib.md5(f"{patent_id}_claims_{i}".encode()).hexdigest(),
                  'patent_id': patent_id,
                  'section': 'claims',
                  'chunk_index': i,
                  'text': chunk,
                  'metadata': base_metadata
              })

      # Description - chunk into smaller pieces
      if row['patent_description']:
          desc_chunks = chunk_text(row['patent_description'])
          for i, chunk in enumerate(desc_chunks):
              documents.append({
                  'doc_id': hashlib.md5(f"{patent_id}_description_{i}".encode()).hexdigest(),
                  'patent_id': patent_id,
                  'section': 'description',
                  'chunk_index': i,
                  'text': chunk,
                  'metadata': base_metadata
              })

      return documents
  ```
</Accordion>

```python theme={null}
# Process patents for embedding
all_documents = []
for row in patents_df.iter_rows(named=True):
    all_documents.extend(prepare_patent_for_embedding(row))
```

## Preparing Data for Sutro

Once we have our patent documents split into chunked sections, we can then pass them to Sutro to transform them into embeddings!

Since we're working with such a large amount of data, we're going to take advantage of Sutro's Datasets feature. This allows use to upload our dataset once (split across multiple files if necessary).

We are also going to do some minor data partitioning and split our single dataframe into multiple \~500MB Parquet files, which will make the uploading process more robust. Lets do that first:

```python theme={null}
docs_df = pl.DataFrame({
    'doc_id': [doc['doc_id'] for doc in all_documents],
    'patent_id': [doc['patent_id'] for doc in all_documents],
    'section': [doc['section'] for doc in all_documents],
    'text': [doc['text'] for doc in all_documents],
    'metadata': [json.dumps(doc['metadata']) for doc in all_documents]
})

# Create output directory
output_dir = "apple_patents_documents_parquet"
os.makedirs(output_dir, exist_ok=True)

# Calculate rows per chunk for ~500MB files
total_size_mb = docs_df.estimated_size() / (1024 * 1024)
num_files = max(1, int(total_size_mb / 500) + 1)
rows_per_chunk = len(docs_df) // num_files + 1

# Split and save
for i in range(0, len(docs_df), rows_per_chunk):
    chunk = docs_df[i:i+rows_per_chunk]
    filename = f"{output_dir}/documents_part_{i//rows_per_chunk:03d}.parquet"
    chunk.write_parquet(filename, compression='snappy')

print(f"Saved {len(docs_df)} documents across {num_files} files to {output_dir}/")
```

Now we'll create and upload to a Dataset. These two lines of code upload all the data neccasary to run our embedding job.

```python theme={null}
dataset_id = sutro.create_dataset()
sutro.upload_to_dataset(dataset_id, output_dir)
```

## Running the Embedding Job

Once we have our dataset loaded fully, we can use it to run our embedding job. For this example, we chose to use **Qwen3-Embedding-0.6B**; it has a great balance of performance due to only having 595M parameters, but still performs very well on relevant tasks like retrieval and re-ranking. See the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for a more in-depth and numerical comparison.

Now we'll kick off our embedding job with the code below:

```python theme={null}
job_id = sutro.infer(
    dataset_id,
    column="text",
    model="qwen-3-embedding-0.6b",
    job_priority=1,
)

sutro.await_job_completion(job_id, obtain_results=False)
```

> **Note:** We're using `job_priority=1` here. Sutro currently has a notion of job priorities, which is essentially how we designate job SLAs. Currently, we have two priorities:- **Priority 0**: Prototyping jobs (for small scale testing, targeted at \<= 10m completion time)- **Priority 1**: Production oriented jobs (large scale jobs, targeted at 1hr completion time)More details about job priorities can be [found here](https://docs.sutro.sh/concepts/job-priority).

Once we have our job started, we wait for it to complete using `await_job_completion(...)`. This will periodically poll for the job's status until its complete; alternatively, we can monitor via [Sutro's web UI](https://app.sutro.sh).

We also disabled automatically fetching the results, instead, we can just download our original Dataset with the job results column appended.

```python theme={null}
results_dir='/mnt/cooper-notebook-volume/apple_patents_embeddings'
os.makedirs(results_dir, exist_ok=True)
sutro.download_from_dataset(dataset_id, output_path=results_dir)
job_results_df = pl.read_parquet(results_dir)

print(f"Total patents: {len(job_results_df)}")
print(f"Columns: {job_results_df.columns}")
# Total patents: 4039988
# Columns: ['text', 'job-76844041-b2bf-4248-9603-b7f750231b34']
```

If you want to play around with the embeddings yourself, the entire set can be found [here on HuggingFace](https://huggingface.co/datasets/sutro/apple-patents-embeddings)!

## Loading into the Vector Database

Now we get to upload the 4M embeddings we just pulled down from Sutro into a vector database, so that we can search over the entire corpus. When uploading, we want to preserve all the metadata associated with each chunk, so that we can correctly attribute a retrieved vector to the right patent and section within the patent.

```python theme={null}
vector_col_name = 'job-76844041-b2bf-4248-9603-b7f750231b34'

# Combine the embeddings with docs_df containing attributing IDs and metadata
combined_df = docs_df.with_columns(
    job_results_df[vector_col_name]
)
```

We chose **Qdrant** as our vector DB of choice for this example; its performant and easy enough to get started with. However, there are many options out there that are well adapted to different use cases, notable ones being:

* **TurboPuffer** - Great for multi-tenant architectures with many tenants
* **Chroma** - Simple and developer-friendly
* **pgvector** - If you're already using PostgreSQL

Since we have such a large dataset, we want to upload using batches:

```python theme={null}
vector_col_name = 'job-76844041-b2bf-4248-9603-b7f750231b34'
# We're using an im-memory DB here for convenience, but Qdrant Cloud
# can be faster to upload to and will persist the embeddings as well
client = QdrantClient(":memory:")
collection_name = "apple_patents_collection"

# Create the collection, inferring vector size from the first
# row of the DataFrame
client.recreate_collection(
    collection_name=collection_name,
    vectors_config=models.VectorParams(
        size=len(combined_df[vector_col_name][0]),
        distance=models.Distance.COSINE
    ),
)

BATCH_SIZE = 8192

# Loop through the DataFrame in batches
for i in tqdm(range(0, len(combined_df), BATCH_SIZE), desc="Uploading to Qdrant"):
    batch_df = combined_df.slice(i, BATCH_SIZE)

    points = [
        models.PointStruct(
            id=row['doc_id'],
            vector=row[vector_col_name],
            payload={
                'patent_id': row['patent_id'],
                'section': row['section'],
                'text': row['text'],
                **json.loads(row['metadata'])
            }
        ) for row in batch_df.to_dicts()
    ]

    client.upload_points(
        collection_name=collection_name,
        points=points,
        wait=True,
        parallel=4
    )

print(f"Finished uploading all {len(combined_df)} points.")
```

## Searching Your Patents

Now that we have our embeddings loaded, we can search over them!

```python theme={null}
def search_patents(query_text, top_k=5):
    # Pick your real time provider of choice, we have heard great
    # things (latency & consistency wise) about Vertex, but Baseten
    # is the easier choice
    # https://www.baseten.co/library/qwen3-06b-embedding/
    query_embedding = real_time_api(
        text=query_text,
        model="qwen-3-embedding-0.6b"
    )

    # Search in Qdrant
    results = client.search(
        collection_name=collection_name,
        query_vector=query_embedding,
        limit=top_k
    )

    for result in results:
        print(f"Patent: {result.payload['patent_id']}")
        print(f"Section: {result.payload['section']}")
        print(f"Score: {result.score:.3f}")
        print(f"Text: {result.payload['text'][:200]}...")
        print("-" * 80)

# Let's test it out!
search_patents("wireless charging efficiency improvements")
```

## Example Queries We Tried

### `wireless charging efficiency improvements`

<Accordion title="View 'wireless charging efficiency improvements' Results">
  ### 1. Patent US-11994681-B2 - description

  **Score:** 0.946

  into waveguide 26 exhibits a relatively wide effective field of view 70 . Switchable reflective layer 56 may be
  switched between the first and second states at a speed greater than the response speed of the human eye (e.g.,
  greater than 60 Hz, greater than 120 Hz, greater than 240 Hz, greater than 1 kHz, greater than 10 kHz, etc.) so that
  a user at eye box 24 ( FIG. 2 ) is unable to separately perceive each state and instead perceives a single effective
  field of view 70 . In this way, image light 22 may be coupled into waveguide 26 and provided to the eye box with a
  wider effective field of view than would otherwise be provided to the eye box. As an example, fields of view 62 and
  66 may each be 30 degrees, 25 degrees, between 25 and 35 degrees, less than 45 degrees, etc., whereas field of view
  70 is 60 degrees, between 55 and 65 degrees, greater than 45 degrees, or any other desired angle greater than field
  of view 62 or field of view 66 .
  FIG. 6 sh

  ***

  ### 2. Patent US-11054882-B2 - description

  **Score:** 0.874

  ments thereof are shown by way of example in the drawings and will herein be described in detail. It should be
  understood, however, that the drawings and detailed description thereto are not intended to limit the embodiments to
  the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and
  alternatives falling within the spirit and scope of the appended claims. The headings used herein are for
  organizational purposes only and are not meant to be used to limit the scope of the description. As used throughout
  this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than
  the mandatory sense (i.e., meaning must). Similarly, the words “include”, “including”, and “includes” mean
  “including, but not limited to.” As used herein, the terms “first,” “second,” etc. are used as labels for nouns that
  they precede, and do not imply any type of ordering (e.g., spatial, temporal, logical, etc.) unle

  ***

  ### 3. Patent US-11380077-B2 - description

  **Score:** 0.874

  ed in units of pressure). Using the intensity of a contact as an attribute of a user input allows for user access to
  additional device functionality that may otherwise not be accessible by the user on a reduced-size device with
  limited real estate for displaying affordances (e.g., on a touch-sensitive display) and/or receiving user input
  (e.g., via a touch-sensitive display, a touch-sensitive surface, or a physical/mechanical control such as a knob or
  a button).
  As used in the specification and claims, the term “tactile output” refers to physical displacement of a device
  relative to a previous position of the device, physical displacement of a component (e.g., a touch-sensitive
  surface) of a device relative to another component (e.g., housing) of the device, or displacement of the component
  relative to a center of mass of the device that will be detected by a user with the user's sense of touch. For
  example, in situations where the device or the component of the device is in

  ***

  ### 4. Patent US-11128157-B2 - description

  **Score:** 0.869

  scan for communications (e.g., Bluetooth communications from secondary power receiving device 24 B) at a first rate.
  In response to the notification, the primary power receiving device 24 A may use the antenna to scan for
  communications at a second rate that is faster than the first rate. By increasing the rate of scanning for
  communications, the primary power receiving device 24 A may receive any communications from secondary power
  receiving device 24 B at an earlier time than if the rate was not increased. In the event that the newly added
  object is not a supported wireless power receiving device, primary power receiving device 24 A will not actually
  receive the expected wireless communication. However, in this case, the faster scan rate may time-out after a
  predetermined length of time (e.g., after the predetermined length of time the scan rate will revert back to the
  first scan rate) without any adverse effects.
  Additional action may be taken by primary power receiving d

  ***

  ### 5. Patent US-10742297-B2 - description

  **Score:** 0.869

  , where for a period larger than 1.60 ms, 240 occasions can be configured. Note that, as above, 60 subframes may be
  used, thus I CQI/PMI has incrementations of 60 in Tables 4 and 5.
  TABLE 4

  mframe

  Value of
  Value of
  offset

  I CQI/PMI
  N pd,frame
  N OFFSET,CQI
  N OFFSET,mframe
</Accordion>

### `biometric authentication using facial recognition`

<Accordion title="View 'biometric authentication using facial recognition' Results">
  ### 1. Patent US-10928697-B1 - description

  **Score:** 0.907

  tween the first transparent layer and the second transparent layer, the transparent light-producing layer having
  light-emitting diodes that are arranged in an array, and a controller for causing the transparent light-producing
  layer to display information using the light-emitting diodes.

  BRIEF DESCRIPTION OF THE DRAWINGS
  FIG. 1 is a side-view representation of an example configurable transparent structure for lighting and/or display.
  FIG. 2A is a side-view representation of an example configurable transparent structure for lighting and/or display
  that employs an edge-lit light guide plate.
  FIG. 2B is a side-view representation of an example configurable transparent structure for lighting and/or display
  that employs an organic light-emitting diode (OLED) display layer.
  FIG. 2C is a side-view representation of an example configurable transparent structure for lighting and/or display
  that employs a micro-light-emitting di

  ***

  ### 2. Patent US-11150734-B2 - description

  **Score:** 0.907

  ion mechanism 210 and may couple the deflection mechanism 210 to a surface. The surface may be a cover glass of an
  electronic device, a housing of the electronic device, and so on. Because the surface is coupled to the deflection
  mechanism 210 , as the deflection mechanism 210 deflects, the surface may also deflect and provide a haptic output.
  Although the haptic structure 200 is specifically discussed with respect to an electronic device, the haptic
  structure 200 may be used with other devices including mechanical devices and electrical devices, as well as
  non-mechanical and non-electrical devices such as described herein.
  FIG. 3A illustrates another example haptic structure 300 for an electronic device. The haptic structure 300 may be
  referred to as a cantilevered beam structure as one end of the deflection mechanism 310 is coupled to, machined
  from, or otherwise integrated with a substrate of the haptic structure 300 while the other end of the defle

  ***

  ### 3. Patent US-11868258-B2 - description

  **Score:** 0.907

  he bytes in a cache block. Thus, the coherency controller 24 may cause other agents to invalidate the cache block.
  If an agent has the cache block modified, the agent may supply the modified cache block to the request agent.
  Otherwise, the agents may not supply the cache block.
  The coherency controller 24 may be configured to read the directory entry for the address of the request (block 220
  ). Based on the cache states in the directory entry, the coherency controller 24 may be configured to generate
  snoops. More particularly, if a given agent may have a modified copy of the cache block (e.g., the given agent has
  the cache block in exclusive or primary state) (block 222 , “yes” leg), the coherency controller 24 may generate a
  snoop forward-Dirty only (SnpFwdDonly) to the agent to transmit the cache block to the request agent (block 224 ).
  As mentioned above, the SnpFwdDonly request may cause the receiving agent to transmit the cache block if the data is
  modified, but o

  ***

  ### 4. Patent US-11393258-B2 - description

  **Score:** 0.906

  more activations required to initiate biometric authentication such that the electronic device is enabled to
  implement the respective function.
  In some examples, the electronic device (e.g., 2300 , 2400 ) displays, on the display, the prompt to provide the one
  or more activations of the button (e.g., 2304 , 2404 ) at a first position in the biometric authentication interface
  (e.g., 2322 , 2420 ). Outputting a prompt requesting that one or more activations of the button be provided provides
  the user with feedback about the current state of the device and provides visual feedback to the user indicating
  what steps the user must take in order to proceed with a particular function using the device. Providing improved
  visual feedback to the user enhances the operability of the device and makes the user-device interface more
  efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting
  with the device) which, additionally, red

  ***

  ### 5. Patent US-11764907-B2 - description

  **Score:** 0.906

  henever the product of residual SINRs is below 1, the error probability for both use cases is reduced and
  consequently, achieving a certain target rate requires less power.
  FIG. 8 , which is motivated by the finding above, shows a graph 800 of the achievable outage probability as a
  function of the transmit power when a single (so-called one-shot) transmission is performed with low rate.
  Similarly, as before, we are interested in the performance of the single device using a dedicated resource, and of
  two devices sharing the slot. To achieve fairness, in the case of two users, their transmit powers are adjusted so
  that the sum matches that of a single user. The particular values are found by solving a min-max problem. For
  example, the maximum of the tuple (P er 1 , er 2 ) is minimized according to the following set of equations:
</Accordion>

### `battery thermal management`

<Accordion title="View 'battery thermal management' Results">
  ### 1. Patent US-11700035-B2 - description

  **Score:** 0.838

  nating elements 68 (e.g., as shown by arrow 162 of FIG. 16 ). Manufacturing equipment 148 may, for example, use
  lasers to activate or create a seed layer on dielectric resonating elements 68 . Manufacturing equipment 148 may
  then deposit conductive material over the activated portions of dielectric resonating elements 68 . The conductive
  material may form conductive structures 86 V and 86 H (e.g., for feed probes 100 V and 100 H of FIG. 6 ) and/or
  parasitic elements for the antennas.
  At step 178 , manufacturing equipment 148 may surface-mount connectors 123 onto the connector contact pads 168 of
  substrate 72 (e.g., as shown by arrow 166 of FIG. 17 ).
  At step 180 , manufacturing equipment 148 may dice substrate 180 into individual antenna modules 120 and may add
  corresponding shielding structures to the antenna modules (e.g., as shown by arrow 170 of FIG. 17 ). The shielding
  may serve to isolate electronic components 150 fr

  ***

  ### 2. Patent US-11297732-B2 - description

  **Score:** 0.838

  illustrate in top plan views various stages of a partially assembled exemplary outer housing foot with integrated
  fan assembly according to various embodiments of the present disclosure.
  FIG. 6 illustrates in side cross-sectional view an exemplary electronic device having a low profile thermal flow
  assembly according to various embodiments of the present disclosure.
  FIG. 7 illustrates in top perspective view an exemplary impeller and fin stack arrangement for an integrated fan
  assembly according to various embodiments of the present disclosure.
  FIGS. 8A and 8B illustrate in bottom plan views exemplary foot and scroll geometries for an integrated fan assembly
  according to various embodiments of the present disclosure.
  FIG. 9 illustrates a flowchart of an exemplary method of cooling an electronic device according to various
  embodiments of the present disclosure.
  FIG. 10 illustrates in block diagram format an exemplary computing devi

  ***

  ### 3. Patent US-11605274-B2 - description

  **Score:** 0.838

  ement 212 , but has to still select “yes” to end the call. In this way, a user cannot accidentally end the call with
  the emergency service. In some cases, block 214 is an alternate option for block 206 , where the layout of the text
  is different and the call time is shown.
  In some examples, the audio messages can be configured to continue looping as long as the call with the emergency is
  active or until the user selects the “Stop Recorded Message” UI element. If the emergency service responder hangs
  up, the call will end. In some instances, the workflow may begin to reestablish the call (or make another call) if
  the user remains nonresponsive. Alternatively, if the call ends, and the emergency service responder calls back, the
  device 102 may answer the call and begin playing the already generated audio message. This audio message could also
  be looped until the user selects the “Stop Recorded Message” UI element. Further, in some cases, the device 102 may
  detect an indicat

  ***

  ### 4. Patent US-11994681-B2 - title

  **Score:** 0.838

  Optical systems with reflective prism input couplers

  ***

  ### 5. Patent US-10719225-B2 - description

  **Score:** 0.838

  ronic device (e.g., the first software application), such as a background application, a suspended application, or a
  hibernated application. Thus, the user can perform operations that are not provided by the application currently
  displayed on the display of the electronic device (e.g., the second software application) but are provided by one of
  the currently open applications (e.g., displaying a home screen or switching to a next software application using
  gestures for a hidden application launcher software application).
  In some embodiments, the first software application is ( 804 ) an application launcher (e.g., a springboard). For
  example, as shown in FIG. 7A , the application launcher displays a plurality of application icons 5002 that
  correspond to a plurality of applications. The application launcher receives a user-selection of an application icon
  5002 (e.g., based on a finger gesture on touch screen 156 ), and in response to receiving the user-selection,
  launches an
</Accordion>

### `haptic feedback for touch interfaces`

<Accordion title="View 'haptic feedback for touch interfaces' Results">
  ### 1. Patent US-12193062-B2 - abstract

  **Score:** 0.897

  Disclosed are techniques for reducing likelihood of Random Access Channel (RACH) transmission blockages and thereby
  facilitate an initial access procedure for new radio (NR) unlicensed spectrum (NR-U) operation in a fifth generation
  (5G) wireless communication system including an NR node. In some embodiments, a parameter generated by a gNB and
  received by a UE indicates that, from among a set of consecutive RACH Occasions (ROs), a gap is available for
  performing a listen-before-talk (LBT) procedure before commencing a RACH transmission.

  ***

  ### 2. Patent US-11468890-B2 - description

  **Score:** 0.897

  mediums), memory controller 122 , one or more processing units (CPUs) 120 , peripherals interface 118 , RF circuitry
  108 , audio circuitry 110 , speaker 111 , microphone 113 , input/output (I/O) subsystem 106 , other input control
  devices 116 , and external port 124 . Device 100 optionally includes one or more optical sensors 164 . Device 100
  optionally includes one or more contact intensity sensors 165 for detecting intensity of contacts on device 100
  (e.g., a touch-sensitive surface such as touch-sensitive display system 112 of device 100 ). Device 100 optionally
  includes one or more tactile output generators 167 for generating tactile outputs on device 100 (e.g., generating
  tactile outputs on a touch-sensitive surface such as touch-sensitive display system 112 of device 100 or touchpad
  355 of device 300 ). These components optionally communicate over one or more communication buses or signal lines
  103 .
  As used in the specification and claim

  ***

  ### 3. Patent US-11733656-B2 - description

  **Score:** 0.897

  rt of the second user interface screen; detect (e.g., with detecting unit 2516 ) a contact on the touch-sensitive
  surface unit (e.g., touch-sensitive surface unit 2510 ) at the affordance for revealing an edit option; and in
  response to detecting the contact at the affordance for revealing an edit option, enable display (e.g., with display
  enabling unit 2508 ), on the display unit (e.g., display unit 2502 ), of a delete affordance in association with the
  first user interface preview image as part of the second user interface screen.
  In some embodiments, displaying the delete affordance comprises translating the first user interface preview image
  on-screen.
  In some embodiments, the processing unit 2506 is further configured to: after displaying the delete affordance as
  part of the second user interface screen, detect (e.g., with detecting unit 2516 ) a contact on the touch-sensitive
  surface unit (e.g., touch-sensitive surface unit 2510 ) at the delete affordance disp

  ***

  ### 4. Patent US-11379113-B2 - title

  **Score:** 0.897

  Techniques for selecting text

  ***

  ### 5. Patent US-11039417-B2 - description

  **Score:** 0.897

  at least SIB2 from the system information.
  At 1832 , the UE may compute a paging frame identifier I PF and a paging occasion identifier I PO based on the DRX
  cycle T, the parameter nB and the Range\_UE\_ID.
  At 1835 , the base station transmits a paging message 1840 for the link-budget-limited UE. The paging message is
  included in a paging frame and paging occasion consistent with the previously transmitted values of DRX cycle T,
  parameter nB and Range\_UE\_ID.
  At 1845 , the UE wakes up for every subframe consistent with the computed paging frame identifier and computed
  paging occasion identifier, and checks the PDCCH of the subframe for the presence of P-RNTI.
  At 1850 , if the UE determines that P-RNTI is present in the PDCCH, the UE decodes resource allocation information
  from the PDCCH, and checks PDSCH resource block(s) identified by the allocation information, e.g., PDSCH resource
  blocks in the same subframe as the PDCCH.
  At 1855 , if the paging
</Accordion>

Interestingly, none of the results for our queries seem very good! We imagine that this is mainly due to a few things.

1. The language used in our queries is *very* different from the langauge used in the patent documents, so the similarity between the query-document pairs is generally not great. There are well known fixes to this problem, commonly [HyDE](https://arxiv.org/abs/2212.10496) is used to generate queries that are more similar to langauge in the real document, and thus retrieve better results for the same source query.
2. Under retrieving: we're currently only retrieving the first 5 documents, which is not very many; if we retrieved more documents, its likely we'd have more relevant snippets in our results.
3. Not reranking: Combining a higher top\_k with a re-ranking step can lead to the finding the most relevant set documents. These two techniques used together can prove to be very powerful and is common with many folks we talk to who use vector search in production.

## Scale, Cost & Speed Breakdown

### Scale

* **Chunk count**: 4.04M
* **Input token count**: 879.5M

### Cost Breakdown

* **BigQuery query**: \~\$6
* **Sutro embedding generation**: \$8.80
* **Total: \~\$14.80**

### Time

* **Job completion time**: 44 minutes

## Conclusion

In this guide, we've demonstrated how Sutro makes it trivial to:

* Go from source documents to a searchable index in under 2 hours
* Generate high-quality embeddings using state-of-the-art models
* Build a semantic search system that can easily be productionized

The entire pipeline - from data extraction to searchable index - can be run using a Jupyter notebook and cost under \$20. Sutro handled all the worker fan out, inference, and fault tolerance automatically.

### Next Steps

* Try different embedding models for your  case
* Experiment with different techniques to improve retrieval quality
  * Hybrid search (combining embeddings with keyword search)
  * HyDE
  * Over retrieval and re-ranking
* Productionize this workflow as part of an event driven pipeline that creates new indices for every X event (say a new user signing up)

### Resources

* [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard)
* [Apple Patents Dataset](https://huggingface.co/datasets/sutro/apple-patents-bigquery)
* [Embeddings Dataset](https://huggingface.co/datasets/sutro/apple-patents-embeddings)
* [Sutro Documentation](https://docs.sutro.sh)
