The Missing Piece: Information Retrieval

The model can now add and embed arbitrary information to your knowledge base. However, it still isn’t able to query it. Let’s create a new tool to allow the model to answer questions by finding relevant information in your knowledge base.
To find similar content, you will need to embed the user’s query, search the database for semantic similarities, then pass those items to the model as context alongside the query.

Updating Embedding Logic

First, let’s update your embedding logic file (lib/ai/embedding.ts) to add functions for finding relevant content:
lib/ai/embedding.ts
import { embed, embedMany } from "ai";
import { openai } from "@ai-sdk/openai";
import { db } from "../db";
import { cosineDistance, desc, gt, sql } from "drizzle-orm";
import { embeddings } from "../db/schema/embeddings";

const embeddingModel = openai.embedding("text-embedding-ada-002");

const generateChunks = (input: string): string[] => {
  return input
    .trim()
    .split(".")
    .filter((i) => i !== "");
};

export const generateEmbeddings = async (
  value: string
): Promise<Array<{ embedding: number[]; content: string }>> => {
  const chunks = generateChunks(value);
  const { embeddings } = await embedMany({
    model: embeddingModel,
    values: chunks,
  });
  return embeddings.map((e, i) => ({ content: chunks[i], embedding: e }));
};

export const generateEmbedding = async (value: string): Promise<number[]> => {
  const input = value.replaceAll("\\n", " ");
  const { embedding } = await embed({
    model: embeddingModel,
    value: input,
  });
  return embedding;
};

export const findRelevantContent = async (userQuery: string) => {
  const userQueryEmbedded = await generateEmbedding(userQuery);
  const similarity = sql<number>`1 - (${cosineDistance(
    embeddings.embedding,
    userQueryEmbedded
  )})`;
  const similarGuides = await db
    .select({ name: embeddings.content, similarity })
    .from(embeddings)
    .where(gt(similarity, 0.5))
    .orderBy((t) => desc(t.similarity))
    .limit(4);
  return similarGuides;
};

New Functions Explained

1

generateEmbedding

Generates a single embedding from an input string for query purposes.
2

findRelevantContent

Embeds the user’s query, searches the database for similar items using cosine similarity, then returns relevant items.
3

Similarity Threshold

Only returns results with similarity greater than 0.5 to ensure relevance.
4

Result Limiting

Limits results to 4 items to avoid overwhelming the model with context.

Adding the Information Retrieval Tool

Now, go back to your route handler (app/api/chat/route.ts) and add a new tool called getInformation:
app/api/chat/route.ts
import { createResource } from "@/lib/actions/resources";
import { openai } from "@ai-sdk/openai";
import {
  convertToModelMessages,
  streamText,
  tool,
  UIMessage,
  stepCountIs,
} from "ai";
import { z } from "zod";
import { findRelevantContent } from "@/lib/ai/embedding";

// Allow streaming responses up to 30 seconds
export const maxDuration = 30;

export async function POST(req: Request) {
  const { messages }: { messages: UIMessage[] } = await req.json();

  const result = streamText({
    model: openai("gpt-4o"),
    messages: convertToModelMessages(messages),
    stopWhen: stepCountIs(5),
    system: `You are a helpful assistant. Check your knowledge base before answering any questions.
    Only respond to questions using information from tool calls.
    if no relevant information is found in the tool calls, respond, "Sorry, I don't know."`,
    tools: {
      addResource: tool({
        description: `add a resource to your knowledge base.
          If the user provides a random piece of knowledge unprompted, use this tool without asking for confirmation.`,
        inputSchema: z.object({
          content: z
            .string()
            .describe("the content or resource to add to your knowledge base"),
        }),
        execute: async ({ content }) => createResource({ content }),
      }),
      getInformation: tool({
        description: `get information from your knowledge base to answer questions.`,
        inputSchema: z.object({
          question: z.string().describe("the users question"),
        }),
        execute: async ({ question }) => findRelevantContent(question),
      }),
    },
  });

  return result.toUIMessageStreamResponse();
}

The findRelevantContent function uses vector similarity to find the most relevant information:

Query Embedding

Vector Conversion: The user’s question is converted to a vector representation.

Similarity Calculation

Cosine Distance: Measures similarity between query and stored embeddings.

Threshold Filtering

Relevance Filter: Only returns results above 0.5 similarity threshold.

Ranked Results

Best Matches: Results are ordered by similarity score (highest first).

Testing the Complete RAG Flow

1

Refresh the Page

Head back to the browser, refresh the page to ensure the new tool is loaded.
2

Ask a Question

Ask for your favorite food (or any information you previously added to the knowledge base).
3

Observe Tool Calls

You should see the model call the getInformation tool, then use the relevant information to formulate a response.
4

Verify Response Quality

The model should now provide informative answers based on your stored knowledge.
With both tools implemented, you now have a complete RAG system! The AI can both store new information and retrieve relevant content to answer questions.

How the Complete Flow Works

Here’s the complete RAG flow in action:
  1. User Input: User asks a question or provides information
  2. AI Analysis: Model determines whether to add information or retrieve it
  3. Tool Selection: Model chooses between addResource or getInformation
  4. Tool Execution: Selected tool performs its function
  5. Result Processing: Tool results are sent back to the model
  6. Response Generation: Model generates a response using the tool results
  7. User Output: Final response is streamed to the user

Information Addition

  • When: User provides new information
  • Tool: addResource
  • Result: Information stored and embedded

Information Retrieval

  • When: User asks a question
  • Tool: getInformation
  • Result: Relevant content found and used

Testing Different Scenarios

1

Test Information Addition

Tell the model various pieces of information to build up your knowledge base.
2

Test Information Retrieval

Ask questions about the information you’ve added to see how well the retrieval works.
3

Test Edge Cases

Try asking questions about topics you haven’t covered to see the “Sorry, I don’t know” response.
4

Test Similarity Matching

Use different phrasings to ask the same question and see how well semantic search works.

Understanding Similarity Scores

The similarity threshold of 0.5 means:
  • 0.5-1.0: High similarity, very relevant content
  • 0.7-0.9: Excellent matches, highly relevant
  • 0.5-0.7: Good matches, relevant content
  • Below 0.5: Too dissimilar, filtered out
You can adjust the similarity threshold based on your needs. Lower values return more results but may be less relevant, while higher values ensure only the most relevant content is returned.

Extension tasks