LLM models
Not Diamond supports the following LLMs with the specified names:
Provider | Model name | Function calling | Structured outputs | Alias |
---|---|---|---|---|
OpenAI | openai/gpt-4o-2024-08-06 | ✔️ | ✔️ | openai/gpt-4o |
openai/gpt-4o-2024-05-13 | ✔️ | ✔️ | ||
openai/gpt-4-turbo-2024-04-09 | ✔️ | ✔️ | openai/gpt-4-turbo | |
openai/gpt-4-0125-preview | ✔️ | ✔️ | openai/gpt-4-turbo-2024-04-09 | |
openai/gpt-4-1106-preview | ✔️ | ✔️ | ||
openai/gpt-4-0613 | ✔️ | ✔️ | openai/gpt-4 | |
openai/gpt-3.5-turbo-0125 | ✔️ | ✔️ | openai/gpt-3.5-turbo | |
openai/gpt-4o-mini-2024-07-18 | ✔️ | ✔️ | openai/gpt-4o-mini | |
openai/chatgpt-4o-latest | ✔️ | |||
openai/o1-preview-2024-09-12 | openai/o1-preview | |||
openai/o1-mini-2024-09-12 | openai/o1-mini | |||
Anthropic | anthropic/claude-3-5-sonnet-latest | ✔️ | ||
anthropic/claude-3-5-sonnet-20241022 | ✔️ | |||
anthropic/claude-3-5-haiku-20241022 | ✔️ | |||
anthropic/claude-3-5-sonnet-20240620 | ✔️ | |||
anthropic/claude-3-opus-20240229 | ✔️ | ✔️ | ||
anthropic/claude-3-sonnet-20240229 | ✔️ | ✔️ | ||
anthropic/claude-3-haiku-20240307 | ✔️ | |||
anthropic/claude-2.1 | ||||
google/gemini-1.5-pro-latest | ✔️ | ✔️ | ||
google/gemini-1.5-flash-latest | ✔️ | ✔️ | ||
google/gemini-1.0-pro-latest | ✔️ | ✔️ | google/gemini-pro | |
Mistral | mistral/open-mixtral-8x22b | ✔️ | ||
mistral/codestral-latest | ||||
mistral/open-mixtral-8x7b | ✔️ | |||
mistral/mistral-large-2407 | ✔️ | ✔️ | mistral/mistral-large-latest | |
mistral/mistral-large-2402 | ✔️ | ✔️ | ||
mistral/mistral-medium-latest | ✔️ | |||
mistral/mistral-small-latest | ✔️ | ✔️ | ||
mistral/open-mistral-7b | ✔️ | |||
mistral/open-mistral-nemo | ✔️ | ✔️ | ||
Replicate | replicate/meta-llama-3-70b-instruct | |||
replicate/meta-llama-3-8b-instruct | ||||
replicate/mixtral-8x7b-instruct-v0.1 | ||||
replicate/mistral-7b-instruct-v0.2 | ||||
replicate/meta-llama-3.1-405b-instruct | ||||
TogetherAI | togetherai/Llama-3-70b-chat-hf | |||
togetherai/Llama-3-8b-chat-hf | ||||
togetherai/Meta-Llama-3.1-8B-Instruct-Turbo | ||||
togetherai/Meta-Llama-3.1-70B-Instruct-Turbo | ||||
togetherai/Meta-Llama-3.1-405B-Instruct-Turbo | ||||
togetherai/Qwen2-72B-Instruct | ||||
togetherai/Mixtral-8x22B-Instruct-v0.1 | ||||
togetherai/Mixtral-8x7B-Instruct-v0.1 | ||||
togetherai/Mistral-7B-Instruct-v0.2 | ||||
Perplexity | perplexity/llama-3.1-sonar-large-128k-online | |||
Cohere | cohere/command-r-plus | ✔️ | ✔️ | |
cohere/command-r | ✔️ | ✔️ |
We are continuously expanding our list of supported models. Send us a note if you have a specific model requirement and we will onboard it for you.
Defining additional configurations
If you'd like to have more control over each LLM you're routing between, you can use the LLMConfig
class. This is especially useful when you want to set API keys explicitly or define additional LLM parameters such as temperature. You can also define custom cost and latency attributes to inform cost and latency tradeoffs:
from notdiamond.llms.config import LLMConfig
from notdiamond import NotDiamond
client = NotDiamond()
llms = [
LLMConfig(
provider="openai",
model="gpt-3.5-turbo",
api_key="YOUR_OPENAI_API_KEY",
temperature=0.5,
max_tokens=256,
# pricing will default to public price
input_price= 1, # USD cost per million tokens
output_price= 0.5, # USD cost per million tokens
latency= 0.86, # Time to first token in seconds
system_prompt=gpt_3_5_turbo_prompt
),
LLMConfig(
provider="anthropic",
model="claude-3-opus-20240229",
api_key="YOUR_ANTHROPIC_API_KEY",
temperature=0.8,
max_tokens=256,
# pricing will default to public price
input_price= 3, # USD cost per million tokens
output_price= 2, # USD cost per million tokens
latency= 1.24, # Time to first token in seconds
system_prompt=claude_3_opus_prompt
),
]
result, session_id, provider = client.chat.completions.create(
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Concisely explain merge sort."} # Adjust as desired
],
model=llms,
)
print("Not Diamond session ID: ", session_id)
print("LLM called: ", provider.model)
print("LLM output: ", result.content)
import { NotDiamond } from 'notdiamond';
const client = new NotDiamond({
apiKey: process.env.NOTDIAMOND_API_KEY,
});
const llms = [
{
provider: 'openai',
model: 'gpt-3.5-turbo',
contextLength: 4096,
// pricing will default to public price
inputPrice: 1, // USD cost per million tokens
outputPrice: 0.5, // USD cost per million tokens
latency: 0.86, // Time to first token in seconds
systemPrompt: gpt_3_5_turbo_prompt
},
{
provider: 'anthropic',
model: 'claude-3-opus-20240229',
contextLength: 100000,
// pricing will default to public price
inputPrice: 3, // USD cost per million tokens
outputPrice: 2, // USD cost per million tokens
latency: 1.24, // Time to first token in seconds
systemPrompt: claude_3_opus_prompt
},
];
const messages = [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Concisely explain merge sort.' },
];
const options = {
messages,
llmProviders: llms,
tradeoff: 'cost',
};
async function main() {
try {
const createResult = await client.create(options);
if ('detail' in createResult) {
console.error('Error:', createResult.detail);
return;
}
console.log('Not Diamond session ID:', session_id);
console.log('LLM called:', providers[0].model);
console.log('LLM output:', createResult.content);
} catch (error) {
console.error('An unexpected error occurred:', error);
}
}
void main();
You can also configure the URL endpoint for all client requests, if necessary:
from notdiamond import NotDiamond
client = NotDiamond(nd_api_url="https://my-api-endpoint.org")
import { NotDiamond } from '../notdiamond';
const client = new NotDiamond({ apiUrl: "https://my-api-endpoint.org" });
Custom models
You can route to your own custom models—whether a fine-tuned model, an agentic workflow, or any other custom inference endpoint—by training your own custom router and including your custom model in the evaluation dataset.
Updated 7 days ago