Training a custom router

Not Diamond is a framework for training custom routing algorithms across a range of candidate LLMs on your evaluation data.

For any given distribution of data, rarely will one single model outperform every other model on every single query. By combining together multiple models into a "meta-model" that learns when to use each LLM, we can beat every individual model’s performance while driving down cost and latency in the process.

Not Diamond integrates with any existing evaluation pipeline and is completely agnostic to your choice of evaluation methods, metrics, frameworks, and tools. All we need is the following three things:

  1. A set of LLM inputs: Inputs must be strings and should be representative of the prompts used in our application.
  2. LLM responses: The responses from candidate LLMs for each input. Candidate LLMs can include both our supported LLMs and your own custom models.
  3. Evaluation scores for responses to the inputs from candidate LLMs: Scores are numbers, and can be derived from any metric that fit your needs.

Below, we will go through Python and TypeScript examples using evaluation results for openai/gpt-5-2025-08-07, openai/gpt-5-mini-2025-08-07, google/gemini-2.5-pro, anthropic/claude-opus-4-20250514, and anthropic/claude-sonnet-4-5-20250929 on the Humaneval dataset.

Initialization

Installation

pip install notdiamond
npm install notdiamond

Set your Not Diamond API key

export NOTDIAMOND_API_KEY=YOUR_NOTDIAMOND_API_KEY

Download the dataset

To get started, let's download the dataset that we've prepared for this example:

curl -L "https://drive.google.com/uc?export=download&id=17hJS_-ecUnMPmRYjQCQmdK5HKP974QQW" -o humaneval.csv

Next, we'll create a file and import the dependencies we'll use to train our custom router:

import pandas as pd
from notdiamond import NotDiamond
import NotDiamond from 'notdiamond';
import * as fs from 'fs';

Rename dataset fields:

sed -i '' 's/final_score/score/g' humaneval.csv 
# OR if in a Colab environment
sed -i 's/final_score/score/g' humaneval.csv

Training quickstart

import os
import json
import pandas as pd
from notdiamond import NotDiamond

# Load dataset
df = pd.read_csv("humaneval.csv")
print("Original columns:", df.columns.tolist())

# Models included in the dataset
models_in_csv = [
    "anthropic/claude-sonnet-4-5-20250929",
    "openai/gpt-5-2025-08-07",
    "google/gemini-2.5-pro",
    "openai/gpt-5-mini-2025-08-07",
    "anthropic/claude-opus-4-20250514",
]

# Save normalized CSV
df.to_csv("humaneval.csv", index=False)

# Initialize ND client
client = NotDiamond(api_key=os.environ.get("NOTDIAMOND_API_KEY"))

# LLMs to train router on
llm_providers = [
    {"provider": "anthropic", "model": "claude-sonnet-4-5-20250929"},
    {"provider": "openai", "model": "gpt-5-2025-08-07"},
    {"provider": "google", "model": "gemini-2.5-pro"},
    {"provider": "openai", "model": "gpt-5-mini-2025-08-07"},
    {"provider": "anthropic", "model": "claude-opus-4-20250514"},
]

# Train the custom router
with open("humaneval.csv", "rb") as f:
    response = client.custom_router.train_custom_router(
        dataset_file=f,
        language="english",
        llm_providers=json.dumps(llm_providers),
        maximize=True,
        prompt_column="Input",
    )

preference_id = response.preference_id
print("Custom router preference ID:", preference_id)
import NotDiamond from 'notdiamond';
import * as fs from 'fs';

// Initialize ND client
const client = new NotDiamond({api_key: process.env.NOTDIAMOND_API_KEY});

// LLMs to train router on
const llmProviders = [
  { provider: 'anthropic', model: 'claude-sonnet-4-5-20250929' },
  { provider: 'openai', model: 'gpt-5-2025-08-07' },
  { provider: 'google', model: 'gemini-2.5-pro' },
  { provider: 'openai', model: 'gpt-5-mini-2025-08-07' },
  { provider: 'anthropic', model: 'claude-opus-4-20250514' },
];

// Train the custom router
const datasetFile = fs.createReadStream('humaneval.csv');

const response = await client.customRouter.trainCustomRouter({
  dataset_file: datasetFile,
  language: 'english',
  llm_providers: JSON.stringify(llmProviders),
  maximize: true,
  prompt_column: 'Input',
});

const preferenceId = response.preference_id;
console.log('✅ Custom router preference ID:', preferenceId);
🚧

Training data limitations

We encourage you to provide as much data with as many LLMs as you want to route between as possible. The minimum number of samples required is 15. However, we have some limits on how much data you can submit. You are allowed to upload up to 5mb of data or 10,000 samples per training job—reach out if you need support for larger file uploads.

📘

Training a custom router can take some time

When you call the train_custom_router method, we process your data and train a custom router to fit your needs. This can take anywhere from a couple minutes up to an hour depending on the size of your dataset. If the training is still in progress and you call Not Diamond using the preference_id returned, you will get an error asking you to wait until it has finished training.

Once training completes, you can pass the preference_id to model_router.select_model():

from notdiamond import NotDiamond

client = NotDiamond()

llm_providers = [
    {"provider": "anthropic", "model": "claude-sonnet-4-5-20250929"},
    {"provider": "openai", "model": "gpt-5-2025-08-07"},
    {"provider": "google", "model": "gemini-2.5-pro"},
    {"provider": "openai", "model": "gpt-5-mini-2025-08-07"},
    {"provider": "anthropic", "model": "claude-opus-4-20250514"},
]

messages = [
    {"role": "user", "content": "Write merge sort in 3 lines."}
]

result = client.model_router.select_model(
    messages=messages,
    llm_providers=llm_providers,
    preference_id=preference_id,
)

print("ND session ID:", result.session_id)
print("LLM selected by router:", result)
import NotDiamond from 'notdiamond';

const client = new NotDiamond();

const llmProviders = [
  { provider: 'anthropic', model: 'claude-sonnet-4-5-20250929' },
  { provider: 'openai', model: 'gpt-5-2025-08-07' },
  { provider: 'google', model: 'gemini-2.5-pro' },
  { provider: 'openai', model: 'gpt-5-mini-2025-08-07' },
  { provider: 'anthropic', model: 'claude-opus-4-20250514' },
];

const messages = [
  { role: 'user', content: 'Write merge sort in 3 lines.' }
];

const result = await client.modelRouter.selectModel({
  messages: messages,
  llmProviders: llmProviders,
  preferenceId: preferenceId,
});

console.log('ND session ID:', result.sessionId);
console.log('LLM selected by router:', result);

Updating a custom router

To update an existing custom router:

  • Append new rows to your humaneval.csv file
  • Call train_custom_router again with the same preference ID
  • Set override=True
with open("humaneval.csv", "rb") as f:
    response = client.custom_router.train_custom_router(
        dataset_file=f,
        language="english",
        llm_providers=json.dumps(llm_providers),
        maximize=True,
        prompt_column="Input",
        preference_id=preference_id,  # reuse the existing router
        override=True,                # overwrite the previous router
    )

print("Router updated:", response.preference_id)
const datasetFile = fs.createReadStream('humaneval.csv');

const response = await client.customRouter.trainCustomRouter({
  datasetFile: datasetFile,
  language: 'english',
  llmProviders: JSON.stringify(llmProviders),
  maximize: true,
  promptColumn: 'Input',
  preferenceId: preferenceId,  // reuse the existing router
  override: true,              // overwrite the previous router
});

console.log('Router updated:', response.preferenceId);
🚧

Updating a router while another job is still running cancels the previous job

If you update an existing router that's currently training, it will cancel the previous run and start a new run with the updated data you've submitted.