What is Model Routing?

Model routing intelligently selects the best LLM from your candidate models to respond to each query in your application. Instead of hardcoding a single model for all tasks, Not Diamond analyzes each input and predicts which model will provide the highest quality response for that specific query.

Why use routing?

Every LLM has different strengths—some excel at creative writing, others at reasoning or code generation. Not Diamond's router automatically routes queries to the right model, helping you:

  • Maximize quality: Achieve better accuracy than any single model by routing each query to the model best suited for it
  • Reduce costs: Save costs by routing simple queries to cost-effective models while reserving frontier models for complex tasks
  • Optimize latency: Balance speed and capability by directing simpler queries to faster models

Getting started: Pre-trained vs. custom routers

Not Diamond offers two router options to fit your development stage:

Pre-trained router - Use Not Diamond's general-purpose router trained on cross-domain data. Perfect for getting started quickly, prototyping, and general-purpose applications. Try it now →

Custom router - Train a router on your data to optimize for your specific use case and evaluation criteria. Recommended for production applications where domain-specific performance matters. Learn more →

Tradeoff modes

Control what Not Diamond optimizes for based on your application's priorities:

  • Quality (default): Routes to the model predicted to give the best response
  • Cost: Balances quality with cost efficiency, preferring cheaper models when appropriate
  • Latency: Minimizes response time while maintaining quality thresholds

Next steps