10 min read ~1-2 hour project ~15 Beginner
Overview
In this example, we’re going to demonstrate how to easily embed over 4M document chunks to create a searchable index. The documents we’ll be embedding is the entire corpus of Apple’s patent literature. By the end of this guide we’ll be able to search for things like “wireless charging technology”, “biometric authentication methods”, or even complex queries like “patents related to reducing battery consumption in mobile devices” - and get relevant patent results in milliseconds.Why Embeddings and Vector Search Matter
Traditional keyword search breaks down when users don’t know the exact terminology - searching for “making phones last longer” won’t find documents about “battery optimization.” Embeddings solve this by converting text into vectors that better capture semantic meaning, enabling search that actually understands intent rather than just matching strings. What once required teams of search quality experts can now be implemented in an afternoon for around $15, as we’ll demonstrate by making 30,000 Apple patents (split into over 4M+ document chunks) semantically searchable.Data Source
The source for the corpus of patent documents will be Google BigQuery, where we can query for the full text and all relevant metadata with the below query. The results for this query (the base for our embeddings) can be found at this HuggingFace DatasetView Full BigQuery Query
View Full BigQuery Query
DataFrame
using the below snippet. Polars is great for efficiently manipulating large datasets.
Document Chunking
After retrieving and storing the patent results, we’ll need to chunk all the relevant sections into smaller sections of text.Why Chunking is Essential
Chunking is essential when working with embeddings mainly because of semantic precision, essentially meaning it is important for the reader of the results to be able to interpret them with the correct contextual meaning: Smaller chunks allow for more precise retrieval of relevant information, however there is a critical balance to strike: Too small chunks (e.g., 50-100 tokens):- ✅ High precision - returns exactly what matches
- ❌ Loss of contextual meaning - “battery life” might miss “wasn’t that good” in the next sentence
- ❌ Fragmented results - you might need to retrieve multiple chunks to get a complete idea
- ❌ More vectors to store and search (higher costs)
- ✅ Rich context preserved - full patent claims or entire technical descriptions
- ✅ Fewer vectors to manage (lower costs)
- ❌ Diluted relevance - a chunk about “display technology” might rank poorly for “OLED” even if it contains relevant OLED information buried within
- ❌ Multiple concepts per chunk - retrieval becomes less precise
- Your content type (patent claims are self-contained; descriptions are narrative)
- Search intent (looking for specific facts vs. understanding concepts)
- Embedding model characteristics (some models better preserve semantics in longer sequences)
Chunking Strategy
For this example, we’re using a simple fixed-size chunking strategy with overlap, but there are many approaches to consider:- Fixed-size chunking: Simple and predictable, splits text every N characters/tokens
- Semantic chunking: Uses NLP to find natural boundaries (sentences, paragraphs, sections)
- Recursive chunking: Hierarchically splits documents while preserving structure
- Corpus-specific chunking: For patents, you might chunk by claims, abstract, description sections; for code, you might designate chunks by function boundaries
View Chunking Helper Functions
View Chunking Helper Functions
Preparing Data for Sutro
Once we have our patent documents split into chunked sections, we can then pass them to Sutro to transform them into embeddings! Since we’re working with such a large amount of data, we’re going to take advantage of Sutro’s Datasets feature. This allows use to upload our dataset once (split across multiple files if necessary). We are also going to do some minor data partitioning and split our single dataframe into multiple ~500MB Parquet files, which will make the uploading process more robust. Lets do that first:Running the Embedding Job
Once we have our dataset loaded fully, we can use it to run our embedding job. For this example, we chose to use Qwen3-Embedding-0.6B; it has a great balance of performance due to only having 595M parameters, but still performs very well on relevant tasks like retrieval and re-ranking. See the MTEB leaderboard for a more in-depth and numerical comparison. Now we’ll kick off our embedding job with the code below:
Note: We’re using job_priority=1
here. Sutro currently has a notion of job priorities, which is essentially how we designate job SLAs. Currently, we have two priorities:- Priority 0: Prototyping jobs (for small scale testing, targeted at <= 10m completion time)- Priority 1: Production oriented jobs (large scale jobs, targeted at 1hr completion time)More details about job priorities can be found here.
Once we have our job started, we wait for it to complete using await_job_completion(...)
. This will periodically poll for the job’s status until its complete; alternatively, we can monitor via Sutro’s web UI.
We also disabled automatically fetching the results, instead, we can just download our original Dataset with the job results column appended.
Loading into the Vector Database
Now we get to upload the 4M embeddings we just pulled down from Sutro into a vector database, so that we can search over the entire corpus. When uploading, we want to preserve all the metadata associated with each chunk, so that we can correctly attribute a retrieved vector to the right patent and section within the patent.- TurboPuffer - Great for multi-tenant architectures with many tenants
- Chroma - Simple and developer-friendly
- pgvector - If you’re already using PostgreSQL
Searching Your Patents
Now that we have our embeddings loaded, we can search over them!Example Queries We Tried
wireless charging efficiency improvements
View 'wireless charging efficiency improvements' Results
View 'wireless charging efficiency improvements' Results
1. Patent US-11994681-B2 - description
Score: 0.946into waveguide 26 exhibits a relatively wide effective field of view 70 . Switchable reflective layer 56 may be switched between the first and second states at a speed greater than the response speed of the human eye (e.g., greater than 60 Hz, greater than 120 Hz, greater than 240 Hz, greater than 1 kHz, greater than 10 kHz, etc.) so that a user at eye box 24 ( FIG. 2 ) is unable to separately perceive each state and instead perceives a single effective field of view 70 . In this way, image light 22 may be coupled into waveguide 26 and provided to the eye box with a wider effective field of view than would otherwise be provided to the eye box. As an example, fields of view 62 and 66 may each be 30 degrees, 25 degrees, between 25 and 35 degrees, less than 45 degrees, etc., whereas field of view 70 is 60 degrees, between 55 and 65 degrees, greater than 45 degrees, or any other desired angle greater than field of view 62 or field of view 66 . FIG. 6 sh2. Patent US-11054882-B2 - description
Score: 0.874ments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include”, “including”, and “includes” mean “including, but not limited to.” As used herein, the terms “first,” “second,” etc. are used as labels for nouns that they precede, and do not imply any type of ordering (e.g., spatial, temporal, logical, etc.) unle3. Patent US-11380077-B2 - description
Score: 0.874ed in units of pressure). Using the intensity of a contact as an attribute of a user input allows for user access to additional device functionality that may otherwise not be accessible by the user on a reduced-size device with limited real estate for displaying affordances (e.g., on a touch-sensitive display) and/or receiving user input (e.g., via a touch-sensitive display, a touch-sensitive surface, or a physical/mechanical control such as a knob or a button). As used in the specification and claims, the term “tactile output” refers to physical displacement of a device relative to a previous position of the device, physical displacement of a component (e.g., a touch-sensitive surface) of a device relative to another component (e.g., housing) of the device, or displacement of the component relative to a center of mass of the device that will be detected by a user with the user’s sense of touch. For example, in situations where the device or the component of the device is in4. Patent US-11128157-B2 - description
Score: 0.869scan for communications (e.g., Bluetooth communications from secondary power receiving device 24 B) at a first rate. In response to the notification, the primary power receiving device 24 A may use the antenna to scan for communications at a second rate that is faster than the first rate. By increasing the rate of scanning for communications, the primary power receiving device 24 A may receive any communications from secondary power receiving device 24 B at an earlier time than if the rate was not increased. In the event that the newly added object is not a supported wireless power receiving device, primary power receiving device 24 A will not actually receive the expected wireless communication. However, in this case, the faster scan rate may time-out after a predetermined length of time (e.g., after the predetermined length of time the scan rate will revert back to the first scan rate) without any adverse effects. Additional action may be taken by primary power receiving d5. Patent US-10742297-B2 - description
Score: 0.869, where for a period larger than 1.60 ms, 240 occasions can be configured. Note that, as above, 60 subframes may be used, thus I CQI/PMI has incrementations of 60 in Tables 4 and 5. TABLE 4mframeValue of Value of offsetI CQI/PMI N pd,frame N OFFSET,CQI N OFFSET,mframebiometric authentication using facial recognition
View 'biometric authentication using facial recognition' Results
View 'biometric authentication using facial recognition' Results
1. Patent US-10928697-B1 - description
Score: 0.907tween the first transparent layer and the second transparent layer, the transparent light-producing layer having light-emitting diodes that are arranged in an array, and a controller for causing the transparent light-producing layer to display information using the light-emitting diodes.BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a side-view representation of an example configurable transparent structure for lighting and/or display. FIG. 2A is a side-view representation of an example configurable transparent structure for lighting and/or display that employs an edge-lit light guide plate. FIG. 2B is a side-view representation of an example configurable transparent structure for lighting and/or display that employs an organic light-emitting diode (OLED) display layer. FIG. 2C is a side-view representation of an example configurable transparent structure for lighting and/or display that employs a micro-light-emitting di2. Patent US-11150734-B2 - description
Score: 0.907ion mechanism 210 and may couple the deflection mechanism 210 to a surface. The surface may be a cover glass of an electronic device, a housing of the electronic device, and so on. Because the surface is coupled to the deflection mechanism 210 , as the deflection mechanism 210 deflects, the surface may also deflect and provide a haptic output. Although the haptic structure 200 is specifically discussed with respect to an electronic device, the haptic structure 200 may be used with other devices including mechanical devices and electrical devices, as well as non-mechanical and non-electrical devices such as described herein. FIG. 3A illustrates another example haptic structure 300 for an electronic device. The haptic structure 300 may be referred to as a cantilevered beam structure as one end of the deflection mechanism 310 is coupled to, machined from, or otherwise integrated with a substrate of the haptic structure 300 while the other end of the defle3. Patent US-11868258-B2 - description
Score: 0.907he bytes in a cache block. Thus, the coherency controller 24 may cause other agents to invalidate the cache block. If an agent has the cache block modified, the agent may supply the modified cache block to the request agent. Otherwise, the agents may not supply the cache block. The coherency controller 24 may be configured to read the directory entry for the address of the request (block 220 ). Based on the cache states in the directory entry, the coherency controller 24 may be configured to generate snoops. More particularly, if a given agent may have a modified copy of the cache block (e.g., the given agent has the cache block in exclusive or primary state) (block 222 , “yes” leg), the coherency controller 24 may generate a snoop forward-Dirty only (SnpFwdDonly) to the agent to transmit the cache block to the request agent (block 224 ). As mentioned above, the SnpFwdDonly request may cause the receiving agent to transmit the cache block if the data is modified, but o4. Patent US-11393258-B2 - description
Score: 0.906more activations required to initiate biometric authentication such that the electronic device is enabled to implement the respective function. In some examples, the electronic device (e.g., 2300 , 2400 ) displays, on the display, the prompt to provide the one or more activations of the button (e.g., 2304 , 2404 ) at a first position in the biometric authentication interface (e.g., 2322 , 2420 ). Outputting a prompt requesting that one or more activations of the button be provided provides the user with feedback about the current state of the device and provides visual feedback to the user indicating what steps the user must take in order to proceed with a particular function using the device. Providing improved visual feedback to the user enhances the operability of the device and makes the user-device interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the device) which, additionally, red5. Patent US-11764907-B2 - description
Score: 0.906henever the product of residual SINRs is below 1, the error probability for both use cases is reduced and consequently, achieving a certain target rate requires less power. FIG. 8 , which is motivated by the finding above, shows a graph 800 of the achievable outage probability as a function of the transmit power when a single (so-called one-shot) transmission is performed with low rate. Similarly, as before, we are interested in the performance of the single device using a dedicated resource, and of two devices sharing the slot. To achieve fairness, in the case of two users, their transmit powers are adjusted so that the sum matches that of a single user. The particular values are found by solving a min-max problem. For example, the maximum of the tuple (P er 1 , er 2 ) is minimized according to the following set of equations:battery thermal management
View 'battery thermal management' Results
View 'battery thermal management' Results
1. Patent US-11700035-B2 - description
Score: 0.838nating elements 68 (e.g., as shown by arrow 162 of FIG. 16 ). Manufacturing equipment 148 may, for example, use lasers to activate or create a seed layer on dielectric resonating elements 68 . Manufacturing equipment 148 may then deposit conductive material over the activated portions of dielectric resonating elements 68 . The conductive material may form conductive structures 86 V and 86 H (e.g., for feed probes 100 V and 100 H of FIG. 6 ) and/or parasitic elements for the antennas. At step 178 , manufacturing equipment 148 may surface-mount connectors 123 onto the connector contact pads 168 of substrate 72 (e.g., as shown by arrow 166 of FIG. 17 ). At step 180 , manufacturing equipment 148 may dice substrate 180 into individual antenna modules 120 and may add corresponding shielding structures to the antenna modules (e.g., as shown by arrow 170 of FIG. 17 ). The shielding may serve to isolate electronic components 150 fr2. Patent US-11297732-B2 - description
Score: 0.838illustrate in top plan views various stages of a partially assembled exemplary outer housing foot with integrated fan assembly according to various embodiments of the present disclosure. FIG. 6 illustrates in side cross-sectional view an exemplary electronic device having a low profile thermal flow assembly according to various embodiments of the present disclosure. FIG. 7 illustrates in top perspective view an exemplary impeller and fin stack arrangement for an integrated fan assembly according to various embodiments of the present disclosure. FIGS. 8A and 8B illustrate in bottom plan views exemplary foot and scroll geometries for an integrated fan assembly according to various embodiments of the present disclosure. FIG. 9 illustrates a flowchart of an exemplary method of cooling an electronic device according to various embodiments of the present disclosure. FIG. 10 illustrates in block diagram format an exemplary computing devi3. Patent US-11605274-B2 - description
Score: 0.838ement 212 , but has to still select “yes” to end the call. In this way, a user cannot accidentally end the call with the emergency service. In some cases, block 214 is an alternate option for block 206 , where the layout of the text is different and the call time is shown. In some examples, the audio messages can be configured to continue looping as long as the call with the emergency is active or until the user selects the “Stop Recorded Message” UI element. If the emergency service responder hangs up, the call will end. In some instances, the workflow may begin to reestablish the call (or make another call) if the user remains nonresponsive. Alternatively, if the call ends, and the emergency service responder calls back, the device 102 may answer the call and begin playing the already generated audio message. This audio message could also be looped until the user selects the “Stop Recorded Message” UI element. Further, in some cases, the device 102 may detect an indicat4. Patent US-11994681-B2 - title
Score: 0.838Optical systems with reflective prism input couplers5. Patent US-10719225-B2 - description
Score: 0.838ronic device (e.g., the first software application), such as a background application, a suspended application, or a hibernated application. Thus, the user can perform operations that are not provided by the application currently displayed on the display of the electronic device (e.g., the second software application) but are provided by one of the currently open applications (e.g., displaying a home screen or switching to a next software application using gestures for a hidden application launcher software application). In some embodiments, the first software application is ( 804 ) an application launcher (e.g., a springboard). For example, as shown in FIG. 7A , the application launcher displays a plurality of application icons 5002 that correspond to a plurality of applications. The application launcher receives a user-selection of an application icon 5002 (e.g., based on a finger gesture on touch screen 156 ), and in response to receiving the user-selection, launches anhaptic feedback for touch interfaces
View 'haptic feedback for touch interfaces' Results
View 'haptic feedback for touch interfaces' Results
1. Patent US-12193062-B2 - abstract
Score: 0.897Disclosed are techniques for reducing likelihood of Random Access Channel (RACH) transmission blockages and thereby facilitate an initial access procedure for new radio (NR) unlicensed spectrum (NR-U) operation in a fifth generation (5G) wireless communication system including an NR node. In some embodiments, a parameter generated by a gNB and received by a UE indicates that, from among a set of consecutive RACH Occasions (ROs), a gap is available for performing a listen-before-talk (LBT) procedure before commencing a RACH transmission.2. Patent US-11468890-B2 - description
Score: 0.897mediums), memory controller 122 , one or more processing units (CPUs) 120 , peripherals interface 118 , RF circuitry 108 , audio circuitry 110 , speaker 111 , microphone 113 , input/output (I/O) subsystem 106 , other input control devices 116 , and external port 124 . Device 100 optionally includes one or more optical sensors 164 . Device 100 optionally includes one or more contact intensity sensors 165 for detecting intensity of contacts on device 100 (e.g., a touch-sensitive surface such as touch-sensitive display system 112 of device 100 ). Device 100 optionally includes one or more tactile output generators 167 for generating tactile outputs on device 100 (e.g., generating tactile outputs on a touch-sensitive surface such as touch-sensitive display system 112 of device 100 or touchpad 355 of device 300 ). These components optionally communicate over one or more communication buses or signal lines 103 . As used in the specification and claim3. Patent US-11733656-B2 - description
Score: 0.897rt of the second user interface screen; detect (e.g., with detecting unit 2516 ) a contact on the touch-sensitive surface unit (e.g., touch-sensitive surface unit 2510 ) at the affordance for revealing an edit option; and in response to detecting the contact at the affordance for revealing an edit option, enable display (e.g., with display enabling unit 2508 ), on the display unit (e.g., display unit 2502 ), of a delete affordance in association with the first user interface preview image as part of the second user interface screen. In some embodiments, displaying the delete affordance comprises translating the first user interface preview image on-screen. In some embodiments, the processing unit 2506 is further configured to: after displaying the delete affordance as part of the second user interface screen, detect (e.g., with detecting unit 2516 ) a contact on the touch-sensitive surface unit (e.g., touch-sensitive surface unit 2510 ) at the delete affordance disp4. Patent US-11379113-B2 - title
Score: 0.897Techniques for selecting text5. Patent US-11039417-B2 - description
Score: 0.897at least SIB2 from the system information. At 1832 , the UE may compute a paging frame identifier I PF and a paging occasion identifier I PO based on the DRX cycle T, the parameter nB and the Range_UE_ID. At 1835 , the base station transmits a paging message 1840 for the link-budget-limited UE. The paging message is included in a paging frame and paging occasion consistent with the previously transmitted values of DRX cycle T, parameter nB and Range_UE_ID. At 1845 , the UE wakes up for every subframe consistent with the computed paging frame identifier and computed paging occasion identifier, and checks the PDCCH of the subframe for the presence of P-RNTI. At 1850 , if the UE determines that P-RNTI is present in the PDCCH, the UE decodes resource allocation information from the PDCCH, and checks PDSCH resource block(s) identified by the allocation information, e.g., PDSCH resource blocks in the same subframe as the PDCCH. At 1855 , if the paging- The language used in our queries is very different from the langauge used in the patent documents, so the similarity between the query-document pairs is generally not great. There are well known fixes to this problem, commonly HyDE is used to generate queries that are more similar to langauge in the real document, and thus retrieve better results for the same source query.
- Under retrieving: we’re currently only retrieving the first 5 documents, which is not very many; if we retrieved more documents, its likely we’d have more relevant snippets in our results.
- Not reranking: Combining a higher top_k with a re-ranking step can lead to the finding the most relevant set documents. These two techniques used together can prove to be very powerful and is common with many folks we talk to who use vector search in production.
Scale, Cost & Speed Breakdown
Scale
- Chunk count: 4.04M
- Input token count: 879.5M
Cost Breakdown
- BigQuery query: ~$6
- Sutro embedding generation: $8.80
- Total: ~$14.80
Time
- Job completion time: 44 minutes
Conclusion
In this guide, we’ve demonstrated how Sutro makes it trivial to:- Go from source documents to a searchable index in under 2 hours
- Generate high-quality embeddings using state-of-the-art models
- Build a semantic search system that can easily be productionized
Next Steps
- Try different embedding models for your case
- Experiment with different techniques to improve retrieval quality
- Hybrid search (combining embeddings with keyword search)
- HyDE
- Over retrieval and re-ranking
- Productionize this workflow as part of an event driven pipeline that creates new indices for every X event (say a new user signing up)