Cost and latency tradeoffs
By default, Not Diamond maximizes quality above all else. However, we can also define explicit cost and latency tradeoffs as a parameter to optimize for speed or cost-savings:
result, session_id, provider = client.chat.completions.create(
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Concisely explain merge sort."}
],
model=['openai/gpt-4o', 'openai/gpt-3.5-turbo'],
tradeoff="cost"
)
const result = await notDiamond.create({
messages: [
{ role: 'system', content: 'You are a world class programmer.' },
{ role: 'user', content: 'Consiely explain merge sort.' },
],
llmProviders: llmProviders,
tradeoff: 'cost', // Consider cheaper models when quality loss is negligible
});
When tradeoff="cost"
, Not Diamond will automatically determine when a query is simple enough to use a cheaper model without degrading the quality of the response. Equivalently, tradeoff="latency"
will consider faster models. When no tradeoff is defined, Not Diamond will simply maximize output quality with no consideration given to cost or latency.
You can also define custom cost and latency attributes for specific models to help inform how tradeoffs are made
Updated 2 months ago