Skip to main content
10 min read ~1-2 hour project ~15 Beginner

Overview

In this example, we’re going to demonstrate how to easily embed over 4M document chunks to create a searchable index. The documents we’ll be embedding is the entire corpus of Apple’s patent literature. By the end of this guide we’ll be able to search for things like “wireless charging technology”, “biometric authentication methods”, or even complex queries like “patents related to reducing battery consumption in mobile devices” - and get relevant patent results in milliseconds.

Why Embeddings and Vector Search Matter

Traditional keyword search breaks down when users don’t know the exact terminology - searching for “making phones last longer” won’t find documents about “battery optimization.” Embeddings solve this by converting text into vectors that better capture semantic meaning, enabling search that actually understands intent rather than just matching strings. What once required teams of search quality experts can now be implemented in an afternoon for around $15, as we’ll demonstrate by making 30,000 Apple patents (split into over 4M+ document chunks) semantically searchable.

Data Source

The source for the corpus of patent documents will be Google BigQuery, where we can query for the full text and all relevant metadata with the below query. The results for this query (the base for our embeddings) can be found at this HuggingFace Dataset
SELECT DISTINCT
    p.publication_number,
    p.application_number,
    p.country_code,
    p.kind_code,
    title.text as patent_title,
    abstract.text as patent_abstract,
    claims.text as patent_claims,
    description.text as patent_description,
    p.filing_date,
    p.publication_date,
    p.grant_date,
    p.priority_date,
    p.inventor,
    assignee.name as assignee_name,
    p.cpc,
    p.family_id,
FROM
    `patents-public-data.patents.publications` AS p,
    UNNEST(title_localized) AS title,
    UNNEST(abstract_localized) AS abstract,
    UNNEST(claims_localized) AS claims,
    UNNEST(description_localized) AS description,
    UNNEST(assignee_harmonized) as assignee
WHERE
    LOWER(assignee.name) LIKE '%apple inc%'
    AND p.kind_code IN ('B1', 'B2')
    AND p.country_code = 'US'
    AND p.grant_date IS NOT NULL
    AND title.language = 'en'
    AND abstract.language = 'en'
    AND claims.language = 'en'
    AND description.language = 'en'
ORDER BY
    p.grant_date DESC
We’ll store the results from that query into a Polars DataFrame using the below snippet. Polars is great for efficiently manipulating large datasets.
# Authenticate
credentials = service_account.Credentials.from_service_account_info(
    json.loads(os.environ['SERVICE_ACCOUNT_JSON'])
)
project_id = 'xxxxx'
client = bigquery.Client(project=project_id, credentials=credentials)

# Run query
query_job = client.query(query_full_text)
results = query_job.result()
patents_df = pl.from_arrow(results.to_arrow())

Document Chunking

After retrieving and storing the patent results, we’ll need to chunk all the relevant sections into smaller sections of text.

Why Chunking is Essential

Chunking is essential when working with embeddings mainly because of semantic precision, essentially meaning it is important for the reader of the results to be able to interpret them with the correct contextual meaning: Smaller chunks allow for more precise retrieval of relevant information, however there is a critical balance to strike: Too small chunks (e.g., 50-100 tokens):
  • ✅ High precision - returns exactly what matches
  • ❌ Loss of contextual meaning - “battery life” might miss “wasn’t that good” in the next sentence
  • ❌ Fragmented results - you might need to retrieve multiple chunks to get a complete idea
  • ❌ More vectors to store and search (higher costs)
Too large chunks (e.g., 2000+ tokens):
  • ✅ Rich context preserved - full patent claims or entire technical descriptions
  • ✅ Fewer vectors to manage (lower costs)
  • ❌ Diluted relevance - a chunk about “display technology” might rank poorly for “OLED” even if it contains relevant OLED information buried within
  • ❌ Multiple concepts per chunk - retrieval becomes less precise
The sweet spot (typically 200-800 tokens) depends on:
  • Your content type (patent claims are self-contained; descriptions are narrative)
  • Search intent (looking for specific facts vs. understanding concepts)
  • Embedding model characteristics (some models better preserve semantics in longer sequences)

Chunking Strategy

For this example, we’re using a simple fixed-size chunking strategy with overlap, but there are many approaches to consider:
  • Fixed-size chunking: Simple and predictable, splits text every N characters/tokens
  • Semantic chunking: Uses NLP to find natural boundaries (sentences, paragraphs, sections)
  • Recursive chunking: Hierarchically splits documents while preserving structure
  • Corpus-specific chunking: For patents, you might chunk by claims, abstract, description sections; for code, you might designate chunks by function boundaries
You can view the implementation of our chunking strategy in the collapsable below. We chose to write this ourselves for simplicity, but we know teams like Unstructured also do a great job here!
def chunk_text(text: str, chunk_size: int = 2000, overlap: int = 400) -> List[str]:
    """Split text into overlapping chunks for embedding"""
    if not text:
        return []

    chunks = []
    start = 0
    text_length = len(text)

    while start < text_length:
        end = min(start + chunk_size, text_length)
        chunk = text[start:end]
        chunks.append(chunk)
        start += chunk_size - overlap

    return chunks

def prepare_patent_for_embedding(row: Dict) -> List[Dict]:
    """Convert patent row to embedding-ready documents"""
    patent_id = row['publication_number']
    base_metadata = {
        'patent_id': patent_id,
        'title': row['patent_title'],
        'assignee': row['assignee_name'],
        'grant_date': str(row['grant_date']),
        'inventors': row['inventor'],
        'cpc_codes': row['cpc'],
        'family_id': row['family_id'],
    }

    documents = []

    # Title as separate document
    documents.append({
        'doc_id': hashlib.md5(f"{patent_id}_title".encode()).hexdigest(),
        'patent_id': patent_id,
        'section': 'title',
        'text': row['patent_title'],
        'metadata': base_metadata
    })

    # Abstract as separate document
    if row['patent_abstract']:
        documents.append({
            'doc_id': hashlib.md5(f"{patent_id}_abstract".encode()).hexdigest(),
            'patent_id': patent_id,
            'section': 'abstract',
            'text': row['patent_abstract'],
            'metadata': base_metadata
        })

    # Claims - chunk if long
    if row['patent_claims']:
        claims_chunks = chunk_text(row['patent_claims'])
        for i, chunk in enumerate(claims_chunks):
            documents.append({
                'doc_id': hashlib.md5(f"{patent_id}_claims_{i}".encode()).hexdigest(),
                'patent_id': patent_id,
                'section': 'claims',
                'chunk_index': i,
                'text': chunk,
                'metadata': base_metadata
            })

    # Description - chunk into smaller pieces
    if row['patent_description']:
        desc_chunks = chunk_text(row['patent_description'])
        for i, chunk in enumerate(desc_chunks):
            documents.append({
                'doc_id': hashlib.md5(f"{patent_id}_description_{i}".encode()).hexdigest(),
                'patent_id': patent_id,
                'section': 'description',
                'chunk_index': i,
                'text': chunk,
                'metadata': base_metadata
            })

    return documents
# Process patents for embedding
all_documents = []
for row in patents_df.iter_rows(named=True):
    all_documents.extend(prepare_patent_for_embedding(row))

Preparing Data for Sutro

Once we have our patent documents split into chunked sections, we can then pass them to Sutro to transform them into embeddings! Since we’re working with such a large amount of data, we’re going to take advantage of Sutro’s Datasets feature. This allows use to upload our dataset once (split across multiple files if necessary). We are also going to do some minor data partitioning and split our single dataframe into multiple ~500MB Parquet files, which will make the uploading process more robust. Lets do that first:
docs_df = pl.DataFrame({
    'doc_id': [doc['doc_id'] for doc in all_documents],
    'patent_id': [doc['patent_id'] for doc in all_documents],
    'section': [doc['section'] for doc in all_documents],
    'text': [doc['text'] for doc in all_documents],
    'metadata': [json.dumps(doc['metadata']) for doc in all_documents]
})

# Create output directory
output_dir = "apple_patents_documents_parquet"
os.makedirs(output_dir, exist_ok=True)

# Calculate rows per chunk for ~500MB files
total_size_mb = docs_df.estimated_size() / (1024 * 1024)
num_files = max(1, int(total_size_mb / 500) + 1)
rows_per_chunk = len(docs_df) // num_files + 1

# Split and save
for i in range(0, len(docs_df), rows_per_chunk):
    chunk = docs_df[i:i+rows_per_chunk]
    filename = f"{output_dir}/documents_part_{i//rows_per_chunk:03d}.parquet"
    chunk.write_parquet(filename, compression='snappy')

print(f"Saved {len(docs_df)} documents across {num_files} files to {output_dir}/")
Now we’ll create and upload to a Dataset. These two lines of code upload all the data neccasary to run our embedding job.
dataset_id = sutro.create_dataset()
sutro.upload_to_dataset(dataset_id, output_dir)

Running the Embedding Job

Once we have our dataset loaded fully, we can use it to run our embedding job. For this example, we chose to use Qwen3-Embedding-0.6B; it has a great balance of performance due to only having 595M parameters, but still performs very well on relevant tasks like retrieval and re-ranking. See the MTEB leaderboard for a more in-depth and numerical comparison. Now we’ll kick off our embedding job with the code below:
job_id = sutro.infer(
    dataset_id,
    column="text",
    model="qwen-3-embedding-0.6b",
    job_priority=1,
)

sutro.await_job_completion(job_id, obtain_results=False)
Note: We’re using job_priority=1 here. Sutro currently has a notion of job priorities, which is essentially how we designate job SLAs. Currently, we have two priorities:- Priority 0: Prototyping jobs (for small scale testing, targeted at <= 10m completion time)- Priority 1: Production oriented jobs (large scale jobs, targeted at 1hr completion time)More details about job priorities can be found here.
Once we have our job started, we wait for it to complete using await_job_completion(...). This will periodically poll for the job’s status until its complete; alternatively, we can monitor via Sutro’s web UI. We also disabled automatically fetching the results, instead, we can just download our original Dataset with the job results column appended.
results_dir='/mnt/cooper-notebook-volume/apple_patents_embeddings'
os.makedirs(results_dir, exist_ok=True)
sutro.download_from_dataset(dataset_id, output_path=results_dir)
job_results_df = pl.read_parquet(results_dir)

print(f"Total patents: {len(job_results_df)}")
print(f"Columns: {job_results_df.columns}")
# Total patents: 4039988
# Columns: ['text', 'job-76844041-b2bf-4248-9603-b7f750231b34']
If you want to play around with the embeddings yourself, the entire set can be found here on HuggingFace!

Loading into the Vector Database

Now we get to upload the 4M embeddings we just pulled down from Sutro into a vector database, so that we can search over the entire corpus. When uploading, we want to preserve all the metadata associated with each chunk, so that we can correctly attribute a retrieved vector to the right patent and section within the patent.
vector_col_name = 'job-76844041-b2bf-4248-9603-b7f750231b34'

# Combine the embeddings with docs_df containing attributing IDs and metadata
combined_df = docs_df.with_columns(
    job_results_df[vector_col_name]
)
We chose Qdrant as our vector DB of choice for this example; its performant and easy enough to get started with. However, there are many options out there that are well adapted to different use cases, notable ones being:
  • TurboPuffer - Great for multi-tenant architectures with many tenants
  • Chroma - Simple and developer-friendly
  • pgvector - If you’re already using PostgreSQL
Since we have such a large dataset, we want to upload using batches:
vector_col_name = 'job-76844041-b2bf-4248-9603-b7f750231b34'
# We're using an im-memory DB here for convenience, but Qdrant Cloud
# can be faster to upload to and will persist the embeddings as well
client = QdrantClient(":memory:")
collection_name = "apple_patents_collection"

# Create the collection, inferring vector size from the first
# row of the DataFrame
client.recreate_collection(
    collection_name=collection_name,
    vectors_config=models.VectorParams(
        size=len(combined_df[vector_col_name][0]),
        distance=models.Distance.COSINE
    ),
)

BATCH_SIZE = 8192

# Loop through the DataFrame in batches
for i in tqdm(range(0, len(combined_df), BATCH_SIZE), desc="Uploading to Qdrant"):
    batch_df = combined_df.slice(i, BATCH_SIZE)

    points = [
        models.PointStruct(
            id=row['doc_id'],
            vector=row[vector_col_name],
            payload={
                'patent_id': row['patent_id'],
                'section': row['section'],
                'text': row['text'],
                **json.loads(row['metadata'])
            }
        ) for row in batch_df.to_dicts()
    ]

    client.upload_points(
        collection_name=collection_name,
        points=points,
        wait=True,
        parallel=4
    )

print(f"Finished uploading all {len(combined_df)} points.")

Searching Your Patents

Now that we have our embeddings loaded, we can search over them!
def search_patents(query_text, top_k=5):
    # Pick your real time provider of choice, we have heard great
    # things (latency & consistency wise) about Vertex, but Baseten
    # is the easier choice
    # https://www.baseten.co/library/qwen3-06b-embedding/
    query_embedding = real_time_api(
        text=query_text,
        model="qwen-3-embedding-0.6b"
    )

    # Search in Qdrant
    results = client.search(
        collection_name=collection_name,
        query_vector=query_embedding,
        limit=top_k
    )

    for result in results:
        print(f"Patent: {result.payload['patent_id']}")
        print(f"Section: {result.payload['section']}")
        print(f"Score: {result.score:.3f}")
        print(f"Text: {result.payload['text'][:200]}...")
        print("-" * 80)

# Let's test it out!
search_patents("wireless charging efficiency improvements")

Example Queries We Tried

wireless charging efficiency improvements

1. Patent US-11994681-B2 - description

Score: 0.946into waveguide 26 exhibits a relatively wide effective field of view 70 . Switchable reflective layer 56 may be switched between the first and second states at a speed greater than the response speed of the human eye (e.g., greater than 60 Hz, greater than 120 Hz, greater than 240 Hz, greater than 1 kHz, greater than 10 kHz, etc.) so that a user at eye box 24 ( FIG. 2 ) is unable to separately perceive each state and instead perceives a single effective field of view 70 . In this way, image light 22 may be coupled into waveguide 26 and provided to the eye box with a wider effective field of view than would otherwise be provided to the eye box. As an example, fields of view 62 and 66 may each be 30 degrees, 25 degrees, between 25 and 35 degrees, less than 45 degrees, etc., whereas field of view 70 is 60 degrees, between 55 and 65 degrees, greater than 45 degrees, or any other desired angle greater than field of view 62 or field of view 66 . FIG. 6 sh

2. Patent US-11054882-B2 - description

Score: 0.874ments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include”, “including”, and “includes” mean “including, but not limited to.” As used herein, the terms “first,” “second,” etc. are used as labels for nouns that they precede, and do not imply any type of ordering (e.g., spatial, temporal, logical, etc.) unle

3. Patent US-11380077-B2 - description

Score: 0.874ed in units of pressure). Using the intensity of a contact as an attribute of a user input allows for user access to additional device functionality that may otherwise not be accessible by the user on a reduced-size device with limited real estate for displaying affordances (e.g., on a touch-sensitive display) and/or receiving user input (e.g., via a touch-sensitive display, a touch-sensitive surface, or a physical/mechanical control such as a knob or a button). As used in the specification and claims, the term “tactile output” refers to physical displacement of a device relative to a previous position of the device, physical displacement of a component (e.g., a touch-sensitive surface) of a device relative to another component (e.g., housing) of the device, or displacement of the component relative to a center of mass of the device that will be detected by a user with the user’s sense of touch. For example, in situations where the device or the component of the device is in

4. Patent US-11128157-B2 - description

Score: 0.869scan for communications (e.g., Bluetooth communications from secondary power receiving device 24 B) at a first rate. In response to the notification, the primary power receiving device 24 A may use the antenna to scan for communications at a second rate that is faster than the first rate. By increasing the rate of scanning for communications, the primary power receiving device 24 A may receive any communications from secondary power receiving device 24 B at an earlier time than if the rate was not increased. In the event that the newly added object is not a supported wireless power receiving device, primary power receiving device 24 A will not actually receive the expected wireless communication. However, in this case, the faster scan rate may time-out after a predetermined length of time (e.g., after the predetermined length of time the scan rate will revert back to the first scan rate) without any adverse effects. Additional action may be taken by primary power receiving d

5. Patent US-10742297-B2 - description

Score: 0.869, where for a period larger than 1.60 ms, 240 occasions can be configured. Note that, as above, 60 subframes may be used, thus I CQI/PMI has incrementations of 60 in Tables 4 and 5. TABLE 4mframeValue of Value of offsetI CQI/PMI N pd,frame N OFFSET,CQI N OFFSET,mframe

biometric authentication using facial recognition

1. Patent US-10928697-B1 - description

Score: 0.907tween the first transparent layer and the second transparent layer, the transparent light-producing layer having light-emitting diodes that are arranged in an array, and a controller for causing the transparent light-producing layer to display information using the light-emitting diodes.BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a side-view representation of an example configurable transparent structure for lighting and/or display. FIG. 2A is a side-view representation of an example configurable transparent structure for lighting and/or display that employs an edge-lit light guide plate. FIG. 2B is a side-view representation of an example configurable transparent structure for lighting and/or display that employs an organic light-emitting diode (OLED) display layer. FIG. 2C is a side-view representation of an example configurable transparent structure for lighting and/or display that employs a micro-light-emitting di

2. Patent US-11150734-B2 - description

Score: 0.907ion mechanism 210 and may couple the deflection mechanism 210 to a surface. The surface may be a cover glass of an electronic device, a housing of the electronic device, and so on. Because the surface is coupled to the deflection mechanism 210 , as the deflection mechanism 210 deflects, the surface may also deflect and provide a haptic output. Although the haptic structure 200 is specifically discussed with respect to an electronic device, the haptic structure 200 may be used with other devices including mechanical devices and electrical devices, as well as non-mechanical and non-electrical devices such as described herein. FIG. 3A illustrates another example haptic structure 300 for an electronic device. The haptic structure 300 may be referred to as a cantilevered beam structure as one end of the deflection mechanism 310 is coupled to, machined from, or otherwise integrated with a substrate of the haptic structure 300 while the other end of the defle

3. Patent US-11868258-B2 - description

Score: 0.907he bytes in a cache block. Thus, the coherency controller 24 may cause other agents to invalidate the cache block. If an agent has the cache block modified, the agent may supply the modified cache block to the request agent. Otherwise, the agents may not supply the cache block. The coherency controller 24 may be configured to read the directory entry for the address of the request (block 220 ). Based on the cache states in the directory entry, the coherency controller 24 may be configured to generate snoops. More particularly, if a given agent may have a modified copy of the cache block (e.g., the given agent has the cache block in exclusive or primary state) (block 222 , “yes” leg), the coherency controller 24 may generate a snoop forward-Dirty only (SnpFwdDonly) to the agent to transmit the cache block to the request agent (block 224 ). As mentioned above, the SnpFwdDonly request may cause the receiving agent to transmit the cache block if the data is modified, but o

4. Patent US-11393258-B2 - description

Score: 0.906more activations required to initiate biometric authentication such that the electronic device is enabled to implement the respective function. In some examples, the electronic device (e.g., 2300 , 2400 ) displays, on the display, the prompt to provide the one or more activations of the button (e.g., 2304 , 2404 ) at a first position in the biometric authentication interface (e.g., 2322 , 2420 ). Outputting a prompt requesting that one or more activations of the button be provided provides the user with feedback about the current state of the device and provides visual feedback to the user indicating what steps the user must take in order to proceed with a particular function using the device. Providing improved visual feedback to the user enhances the operability of the device and makes the user-device interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the device) which, additionally, red

5. Patent US-11764907-B2 - description

Score: 0.906henever the product of residual SINRs is below 1, the error probability for both use cases is reduced and consequently, achieving a certain target rate requires less power. FIG. 8 , which is motivated by the finding above, shows a graph 800 of the achievable outage probability as a function of the transmit power when a single (so-called one-shot) transmission is performed with low rate. Similarly, as before, we are interested in the performance of the single device using a dedicated resource, and of two devices sharing the slot. To achieve fairness, in the case of two users, their transmit powers are adjusted so that the sum matches that of a single user. The particular values are found by solving a min-max problem. For example, the maximum of the tuple (P er 1 , er 2 ) is minimized according to the following set of equations:

battery thermal management

1. Patent US-11700035-B2 - description

Score: 0.838nating elements 68 (e.g., as shown by arrow 162 of FIG. 16 ). Manufacturing equipment 148 may, for example, use lasers to activate or create a seed layer on dielectric resonating elements 68 . Manufacturing equipment 148 may then deposit conductive material over the activated portions of dielectric resonating elements 68 . The conductive material may form conductive structures 86 V and 86 H (e.g., for feed probes 100 V and 100 H of FIG. 6 ) and/or parasitic elements for the antennas. At step 178 , manufacturing equipment 148 may surface-mount connectors 123 onto the connector contact pads 168 of substrate 72 (e.g., as shown by arrow 166 of FIG. 17 ). At step 180 , manufacturing equipment 148 may dice substrate 180 into individual antenna modules 120 and may add corresponding shielding structures to the antenna modules (e.g., as shown by arrow 170 of FIG. 17 ). The shielding may serve to isolate electronic components 150 fr

2. Patent US-11297732-B2 - description

Score: 0.838illustrate in top plan views various stages of a partially assembled exemplary outer housing foot with integrated fan assembly according to various embodiments of the present disclosure. FIG. 6 illustrates in side cross-sectional view an exemplary electronic device having a low profile thermal flow assembly according to various embodiments of the present disclosure. FIG. 7 illustrates in top perspective view an exemplary impeller and fin stack arrangement for an integrated fan assembly according to various embodiments of the present disclosure. FIGS. 8A and 8B illustrate in bottom plan views exemplary foot and scroll geometries for an integrated fan assembly according to various embodiments of the present disclosure. FIG. 9 illustrates a flowchart of an exemplary method of cooling an electronic device according to various embodiments of the present disclosure. FIG. 10 illustrates in block diagram format an exemplary computing devi

3. Patent US-11605274-B2 - description

Score: 0.838ement 212 , but has to still select “yes” to end the call. In this way, a user cannot accidentally end the call with the emergency service. In some cases, block 214 is an alternate option for block 206 , where the layout of the text is different and the call time is shown. In some examples, the audio messages can be configured to continue looping as long as the call with the emergency is active or until the user selects the “Stop Recorded Message” UI element. If the emergency service responder hangs up, the call will end. In some instances, the workflow may begin to reestablish the call (or make another call) if the user remains nonresponsive. Alternatively, if the call ends, and the emergency service responder calls back, the device 102 may answer the call and begin playing the already generated audio message. This audio message could also be looped until the user selects the “Stop Recorded Message” UI element. Further, in some cases, the device 102 may detect an indicat

4. Patent US-11994681-B2 - title

Score: 0.838Optical systems with reflective prism input couplers

5. Patent US-10719225-B2 - description

Score: 0.838ronic device (e.g., the first software application), such as a background application, a suspended application, or a hibernated application. Thus, the user can perform operations that are not provided by the application currently displayed on the display of the electronic device (e.g., the second software application) but are provided by one of the currently open applications (e.g., displaying a home screen or switching to a next software application using gestures for a hidden application launcher software application). In some embodiments, the first software application is ( 804 ) an application launcher (e.g., a springboard). For example, as shown in FIG. 7A , the application launcher displays a plurality of application icons 5002 that correspond to a plurality of applications. The application launcher receives a user-selection of an application icon 5002 (e.g., based on a finger gesture on touch screen 156 ), and in response to receiving the user-selection, launches an

haptic feedback for touch interfaces

1. Patent US-12193062-B2 - abstract

Score: 0.897Disclosed are techniques for reducing likelihood of Random Access Channel (RACH) transmission blockages and thereby facilitate an initial access procedure for new radio (NR) unlicensed spectrum (NR-U) operation in a fifth generation (5G) wireless communication system including an NR node. In some embodiments, a parameter generated by a gNB and received by a UE indicates that, from among a set of consecutive RACH Occasions (ROs), a gap is available for performing a listen-before-talk (LBT) procedure before commencing a RACH transmission.

2. Patent US-11468890-B2 - description

Score: 0.897mediums), memory controller 122 , one or more processing units (CPUs) 120 , peripherals interface 118 , RF circuitry 108 , audio circuitry 110 , speaker 111 , microphone 113 , input/output (I/O) subsystem 106 , other input control devices 116 , and external port 124 . Device 100 optionally includes one or more optical sensors 164 . Device 100 optionally includes one or more contact intensity sensors 165 for detecting intensity of contacts on device 100 (e.g., a touch-sensitive surface such as touch-sensitive display system 112 of device 100 ). Device 100 optionally includes one or more tactile output generators 167 for generating tactile outputs on device 100 (e.g., generating tactile outputs on a touch-sensitive surface such as touch-sensitive display system 112 of device 100 or touchpad 355 of device 300 ). These components optionally communicate over one or more communication buses or signal lines 103 . As used in the specification and claim

3. Patent US-11733656-B2 - description

Score: 0.897rt of the second user interface screen; detect (e.g., with detecting unit 2516 ) a contact on the touch-sensitive surface unit (e.g., touch-sensitive surface unit 2510 ) at the affordance for revealing an edit option; and in response to detecting the contact at the affordance for revealing an edit option, enable display (e.g., with display enabling unit 2508 ), on the display unit (e.g., display unit 2502 ), of a delete affordance in association with the first user interface preview image as part of the second user interface screen. In some embodiments, displaying the delete affordance comprises translating the first user interface preview image on-screen. In some embodiments, the processing unit 2506 is further configured to: after displaying the delete affordance as part of the second user interface screen, detect (e.g., with detecting unit 2516 ) a contact on the touch-sensitive surface unit (e.g., touch-sensitive surface unit 2510 ) at the delete affordance disp

4. Patent US-11379113-B2 - title

Score: 0.897Techniques for selecting text

5. Patent US-11039417-B2 - description

Score: 0.897at least SIB2 from the system information. At 1832 , the UE may compute a paging frame identifier I PF and a paging occasion identifier I PO based on the DRX cycle T, the parameter nB and the Range_UE_ID. At 1835 , the base station transmits a paging message 1840 for the link-budget-limited UE. The paging message is included in a paging frame and paging occasion consistent with the previously transmitted values of DRX cycle T, parameter nB and Range_UE_ID. At 1845 , the UE wakes up for every subframe consistent with the computed paging frame identifier and computed paging occasion identifier, and checks the PDCCH of the subframe for the presence of P-RNTI. At 1850 , if the UE determines that P-RNTI is present in the PDCCH, the UE decodes resource allocation information from the PDCCH, and checks PDSCH resource block(s) identified by the allocation information, e.g., PDSCH resource blocks in the same subframe as the PDCCH. At 1855 , if the paging
Interestingly, none of the results for our queries seem very good! We imagine that this is mainly due to a few things.
  1. The language used in our queries is very different from the langauge used in the patent documents, so the similarity between the query-document pairs is generally not great. There are well known fixes to this problem, commonly HyDE is used to generate queries that are more similar to langauge in the real document, and thus retrieve better results for the same source query.
  2. Under retrieving: we’re currently only retrieving the first 5 documents, which is not very many; if we retrieved more documents, its likely we’d have more relevant snippets in our results.
  3. Not reranking: Combining a higher top_k with a re-ranking step can lead to the finding the most relevant set documents. These two techniques used together can prove to be very powerful and is common with many folks we talk to who use vector search in production.

Scale, Cost & Speed Breakdown

Scale

  • Chunk count: 4.04M
  • Input token count: 879.5M

Cost Breakdown

  • BigQuery query: ~$6
  • Sutro embedding generation: $8.80
  • Total: ~$14.80

Time

  • Job completion time: 44 minutes

Conclusion

In this guide, we’ve demonstrated how Sutro makes it trivial to:
  • Go from source documents to a searchable index in under 2 hours
  • Generate high-quality embeddings using state-of-the-art models
  • Build a semantic search system that can easily be productionized
The entire pipeline - from data extraction to searchable index - can be run using a Jupyter notebook and cost under $20. Sutro handled all the worker fan out, inference, and fault tolerance automatically.

Next Steps

  • Try different embedding models for your case
  • Experiment with different techniques to improve retrieval quality
    • Hybrid search (combining embeddings with keyword search)
    • HyDE
    • Over retrieval and re-ranking
  • Productionize this workflow as part of an event driven pipeline that creates new indices for every X event (say a new user signing up)

Resources