Overview
To create an embedding, you will start with a piece of source material (unknown length), break it down into smaller chunks, embed each chunk, and then save the chunk to the database.Write the function to chunk content
1
Setting Up the AI Directory
Create a file
embedding.ts
in the lib/ai
directory2
Generate Chunks
Let’s start by creating a function to break the source material into small chunks. Add the following code to This function will take an input string and split it by periods, filtering out any empty items. This will return an array of strings. It is worth experimenting with different chunking techniques in your projects as the best technique will vary.
lib/ai/embedding.ts
:lib/ai/embedding.ts
3
Install AI SDK
You will use the AI SDK to create embeddings. This will require two more dependencies, which you can install by running the following command:This will install the AI SDK, AI SDK’s React hooks, and AI SDK’s OpenAI provider.The AI SDK is designed to be a unified interface to interact with any large language model. This means that you can change model and providers with just one line of code!
4
Generate Embeddings
Let’s add a function to generate embeddings. Copy the following code into your
lib/ai/embedding.ts
file:lib/ai/embedding.ts
Understanding the Code
In this code, you first define the model you want to use for the embeddings. In this example, you are using OpenAI’stext-embedding-ada-002
embedding model.
Next, you create an asynchronous function called generateEmbeddings
. This function will take in the source material (value
) as an input and return a promise of an array of objects, each containing an embedding and content. Within the function, you first generate chunks for the input. Then, you pass those chunks to the embedMany
function imported from the AI SDK which will return embeddings of the chunks you passed in. Finally, you map over and return the embeddings in a format that is ready to save in the database.
Key Components
Chunking Function
generateChunks: Splits input text by periods and filters out empty
strings
Embedding Model
text-embedding-ada-002: OpenAI’s embedding model with 1536 dimensions
Batch Processing
embedMany: Processes multiple chunks at once for efficiency
Output Format
Structured Result: Returns array of objects with content and embedding