3.2 Add Embedding Logic - Cordoba Institute of Knowledge

Overview

To create an embedding, you will start with a piece of source material (unknown length), break it down into smaller chunks, embed each chunk, and then save the chunk to the database.

Write the function to chunk content

Setting Up the AI Directory

Create a file embedding.ts in the lib/ai directory

Generate Chunks

Let’s start by creating a function to break the source material into small chunks. Add the following code to lib/ai/embedding.ts:

lib/ai/embedding.ts

const generateChunks = (input: string): string[] => {
  return input
    .trim()
    .split(".")
    .filter((i) => i !== "");
};

This function will take an input string and split it by periods, filtering out any empty items. This will return an array of strings. It is worth experimenting with different chunking techniques in your projects as the best technique will vary.

Install AI SDK

You will use the AI SDK to create embeddings. This will require two more dependencies, which you can install by running the following command:

pnpm add ai @ai-sdk/react @ai-sdk/openai

This will install the AI SDK, AI SDK’s React hooks, and AI SDK’s OpenAI provider.The AI SDK is designed to be a unified interface to interact with any large language model. This means that you can change model and providers with just one line of code!

Generate Embeddings

Let’s add a function to generate embeddings. Copy the following code into your lib/ai/embedding.ts file:

lib/ai/embedding.ts

import { embedMany } from "ai";
import { openai } from "@ai-sdk/openai";

const embeddingModel = openai.embedding("text-embedding-ada-002");

const generateChunks = (input: string): string[] => {
  return input
    .trim()
    .split(".")
    .filter((i) => i !== "");
};

export const generateEmbeddings = async (
  value: string
): Promise<Array<{ embedding: number[]; content: string }>> => {
  const chunks = generateChunks(value);
  const { embeddings } = await embedMany({
    model: embeddingModel,
    values: chunks,
  });
  return embeddings.map((e, i) => ({ content: chunks[i], embedding: e }));
};

Understanding the Code

In this code, you first define the model you want to use for the embeddings. In this example, you are using OpenAI’s text-embedding-ada-002 embedding model. Next, you create an asynchronous function called generateEmbeddings. This function will take in the source material (value) as an input and return a promise of an array of objects, each containing an embedding and content. Within the function, you first generate chunks for the input. Then, you pass those chunks to the embedMany function imported from the AI SDK which will return embeddings of the chunks you passed in. Finally, you map over and return the embeddings in a format that is ready to save in the database.

Key Components

Chunking Function

generateChunks: Splits input text by periods and filters out empty strings

Embedding Model

text-embedding-ada-002: OpenAI’s embedding model with 1536 dimensions

Batch Processing

embedMany: Processes multiple chunks at once for efficiency

Output Format

Structured Result: Returns array of objects with content and embedding

RAG Chatbot

​Overview

​Write the function to chunk content

​Understanding the Code

​Key Components