LLM models

Not Diamond supports the following LLMs with the specified names:

ProviderModel nameFunction callingStructured outputsAlias
OpenAIopenai/gpt-4o-2024-08-06✔️✔️openai/gpt-4o
openai/gpt-4o-2024-05-13✔️✔️
openai/gpt-4-turbo-2024-04-09✔️✔️openai/gpt-4-turbo
openai/gpt-4-0125-preview✔️✔️openai/gpt-4-turbo-2024-04-09
openai/gpt-4-1106-preview✔️✔️
openai/gpt-4-0613✔️✔️openai/gpt-4
openai/gpt-3.5-turbo-0125✔️✔️openai/gpt-3.5-turbo
openai/gpt-4o-mini-2024-07-18✔️✔️openai/gpt-4o-mini
openai/chatgpt-4o-latest✔️
openai/o1-preview-2024-09-12openai/o1-preview
openai/o1-mini-2024-09-12openai/o1-mini
Anthropicanthropic/claude-3-5-sonnet-latest✔️
anthropic/claude-3-5-sonnet-20241022✔️
anthropic/claude-3-5-haiku-20241022✔️
anthropic/claude-3-5-sonnet-20240620✔️
anthropic/claude-3-opus-20240229✔️✔️
anthropic/claude-3-sonnet-20240229✔️✔️
anthropic/claude-3-haiku-20240307✔️
anthropic/claude-2.1
Googlegoogle/gemini-1.5-pro-latest✔️✔️
google/gemini-1.5-flash-latest✔️✔️
Mistralmistral/open-mixtral-8x22b✔️
mistral/codestral-latest
mistral/open-mixtral-8x7b✔️
mistral/mistral-large-2407✔️✔️mistral/mistral-large-latest
mistral/mistral-large-2402✔️✔️
mistral/mistral-medium-latest✔️
mistral/mistral-small-latest✔️✔️
mistral/open-mistral-7b✔️
mistral/open-mistral-nemo✔️✔️
Replicatereplicate/meta-llama-3-70b-instruct
replicate/meta-llama-3-8b-instruct
replicate/mixtral-8x7b-instruct-v0.1
replicate/mistral-7b-instruct-v0.2
replicate/meta-llama-3.1-405b-instruct
TogetherAItogetherai/Llama-3-70b-chat-hf
togetherai/Llama-3-8b-chat-hf
togetherai/Meta-Llama-3.1-8B-Instruct-Turbo
togetherai/Meta-Llama-3.1-70B-Instruct-Turbo
togetherai/Meta-Llama-3.1-405B-Instruct-Turbo
togetherai/Qwen2-72B-Instruct
togetherai/Mixtral-8x22B-Instruct-v0.1
togetherai/Mixtral-8x7B-Instruct-v0.1
togetherai/Mistral-7B-Instruct-v0.2
Perplexityperplexity/llama-3.1-sonar-large-128k-online
Coherecohere/command-r-plus✔️✔️
cohere/command-r✔️✔️

We are continuously expanding our list of supported models. Send us a note if you have a specific model requirement and we will onboard it for you.

Defining additional configurations

If you'd like to have more control over each LLM you're routing between, you can use the LLMConfig class. This is especially useful when you want to set API keys explicitly or define additional LLM parameters such as temperature. You can also define custom cost and latency attributes to inform cost and latency tradeoffs:

from notdiamond.llms.config import LLMConfig
from notdiamond import NotDiamond

client = NotDiamond()

llms = [
    LLMConfig(
        provider="openai",
        model="gpt-3.5-turbo",
        api_key="YOUR_OPENAI_API_KEY",
        temperature=0.5,
        max_tokens=256,
        # pricing will default to public price
        input_price= 1,  # USD cost per million tokens
        output_price= 0.5,  # USD cost per million tokens
        latency= 0.86,  # Time to first token in seconds
      	system_prompt=gpt_3_5_turbo_prompt
    ),
    LLMConfig(
        provider="anthropic",
        model="claude-3-opus-20240229",
        api_key="YOUR_ANTHROPIC_API_KEY",
        temperature=0.8,
        max_tokens=256,
        # pricing will default to public price
        input_price= 3,  # USD cost per million tokens
        output_price= 2,  # USD cost per million tokens
        latency= 1.24,  # Time to first token in seconds
      	system_prompt=claude_3_opus_prompt
    ),
]

result, session_id, provider = client.chat.completions.create(
    messages=[ 
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Concisely explain merge sort."}  # Adjust as desired
    ],
    model=llms,
)

print("Not Diamond session ID: ", session_id)
print("LLM called: ", provider.model)
print("LLM output: ", result.content)
import { NotDiamond } from 'notdiamond';

const client = new NotDiamond({
  apiKey: process.env.NOTDIAMOND_API_KEY,
});

const llms = [
  {
    provider: 'openai',
    model: 'gpt-3.5-turbo',
    contextLength: 4096,
    // pricing will default to public price
    inputPrice: 1,  // USD cost per million tokens
    outputPrice: 0.5,  // USD cost per million tokens
    latency: 0.86,  // Time to first token in seconds
    systemPrompt: gpt_3_5_turbo_prompt
  },
  {
    provider: 'anthropic',
    model: 'claude-3-opus-20240229',
    contextLength: 100000,
    // pricing will default to public price
    inputPrice: 3,  // USD cost per million tokens
    outputPrice: 2,  // USD cost per million tokens
    latency: 1.24,  // Time to first token in seconds
    systemPrompt: claude_3_opus_prompt
  },
];

const messages = [
  { role: 'system', content: 'You are a helpful assistant.' },
  { role: 'user', content: 'Concisely explain merge sort.' },
];

const options = {
  messages,
  llmProviders: llms,
  tradeoff: 'cost',
};

async function main() {
  try {
    const createResult = await client.create(options);

    if ('detail' in createResult) {
      console.error('Error:', createResult.detail);
      return;
    }

    console.log('Not Diamond session ID:', session_id);
    console.log('LLM called:', providers[0].model);
    console.log('LLM output:', createResult.content);
  } catch (error) {
    console.error('An unexpected error occurred:', error);
  }
}

void main();

You can also configure the URL endpoint for all client requests, if necessary:

from notdiamond import NotDiamond

client = NotDiamond(nd_api_url="https://my-api-endpoint.org")
import { NotDiamond } from '../notdiamond';

const client = new NotDiamond({ apiUrl: "https://my-api-endpoint.org" });

Custom models

You can route to your own custom models—whether a fine-tuned model, an agentic workflow, or any other custom inference endpoint—by training your own custom router and including your custom model in the evaluation dataset.