RAG workflows with LlamaIndex

Not Diamond is well-suited to RAG workflows, particularly for handling diverse queries (question-answering, summarization, analysis) and for reducing the high inference costs that RAG can incur. You can try our Not Diamond-powered RAG chatbot yourself to see how it works. You can dig into the full code for this app, or explore the simpler example below.

In a RAG application, we retrieve relevant information from a set of documents and bring it into our prompt. In this example we will be using llama-index to perform the document indexing and retrieval, but you can use any library you want.

First, we'll install notdiamond and llama-index:

pip install notdiamond[create] llama-index

npm install notdiamond llamaindex dotenv

Next, we'll download the LlamaIndex terms of service as an example document into the project directory:

curl -L "https://www.llamaindex.ai/files/terms-of-service.pdf" -o llama-index-tos.pdf

Next, we'll create a file with the following code:

from notdiamond import NotDiamond
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Load the terms of service documents and pass them into a vector store for retrieval
documents = SimpleDirectoryReader(input_files=["llama-index-tos.pdf"]).load_data()
index = VectorStoreIndex.from_documents(documents)
retriever = index.as_retriever()

# Define the user query
query = "Will my data be used for training?"

# Retrieve the relevant context from the terms of service
nodes = retriever.retrieve(query)
context = ""
for node in nodes:
  context += f"{node.get_text()}\n"
  
# Define your message template
message = f"""
  I am a customer who needs your help with the terms of service. 
  The following document is the relevant part of the terms of service to my query.
  Document: {context}
  My query: {query}
"""
  
# Define your NotDiamond routing client
client = NotDiamond()  

# Define the LLMs you'd like to route between
llm_providers = ['openai/gpt-3.5-turbo', 'openai/gpt-4-turbo-2024-04-09', 'openai/gpt-4o-2024-05-13', 
                 'anthropic/claude-3-haiku-20240307', 'anthropic/claude-3-opus-20240229']

# The best LLM is determined by the ND API and the LLM is called client-side
result, session_id, provider = client.chat.completions.create(
    messages=[
        {"role": "system", "content": "You are a helpful customer service agent."},
        {"role": "user", "content": message},
    ],
    model=llm_providers
)

print("ND session ID: ", session_id)  # A unique ID of the model call. Important for personalizing ND to your use-case
print("LLM called: ", provider.model)  # The LLM routed to
print("LLM output: ", result.content)  # The LLM response

import { NotDiamond } from 'notdiamond';
import { VectorStoreIndex, SimpleDirectoryReader } from 'llamaindex';
import * as fs from 'node:fs/promises';
require('dotenv').config();

// Load the terms of service documents and pass them into a vector store for retrieval
const documents = await new SimpleDirectoryReader({inputFiles: ["llama-index-tos.pdf"] }).loadData();
const index = await VectorStoreIndex.fromDocuments(documents);
const retriever = index.asRetriever();

// Define the user query
const query = "Will my data be used for training?";

// Retrieve the relevant context from the terms of service
const nodes = await retriever.retrieve(query);
let context = "";
for (const node of nodes) {
  context += `${node.getText()}\n`;
}

// Define your message template
const message = `
  I am a customer who needs your help with the terms of service. 
  The following document is the relevant part of the terms of service to my query.
  Document: ${context}
  My query: ${query}
`;

// Initialize the Not Diamond client with an API key
const notDiamond = new NotDiamond({
  apiKey: process.env.NOTDIAMOND_API_KEY,
});

// Define the LLMs you'd like to route between
const llmProviders = [
  { provider: 'openai', model: 'gpt-3.5-turbo' },
  { provider: 'openai', model: 'gpt-4-turbo-2024-04-09' },
  { provider: 'openai', model: 'gpt-4o-2024-05-13' },
  { provider: 'anthropic', model: 'claude-3-haiku-20240307' },
  { provider: 'anthropic', model: 'claude-3-opus-20240229' },
];

// The best LLM is determined by Not Diamond and the LLM request is made client-side
const result = await notDiamond.modelSelect({
  messages: [
    { role: 'system', content: 'You are a helpful customer service agent.' },
    { role: 'user', content: message },
  ],
  llmProviders: llmProviders
});

if ('detail' in result) {
  console.error('Error:', result.detail);
}

console.log('ND session ID:', result.session_id); // A unique ID of the call for personalizing routing to your use-case
console.log('LLM called:', result.provider.model); // The LLM routed to

In the above example, we have constructed a RAG application that answers questions about terms of service, using Llama Index as the retrieval library. The remainder of the code follows the same structure as the Not Diamond quickstart example