Reliability, fallbacks, and load-balancing
In this section we will learn how to use Not Diamond to improve the reliability and uptime of our application through the following methods:
- Falling back to a default model if Not Diamond fails to return a response
- Defining custom fallback logic for our router
- Leveraging Not Diamond's reliability and load-balancing toolkit for
openai
clients
Falling back to a default model if Not Diamond fails to return a response
Because Not Diamond is not a proxy, we can eliminate the risk of disruptions if Not Diamond ever fails to return a response. We can define a timeout
for how many seconds to wait for a model recommendation from Not Diamond's API, and we can configure a fallback model as default
in case of error or timeout. The default
parameter is of type string
and represents the specific model from the llm_providers
list we want to use as a fallback.
result, session_id, provider = client.chat.completions.create(
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Consiely explain merge sort."} # Adjust as desired
],
model=['openai/gpt-3.5-turbo', 'openai/gpt-4o', 'anthropic/claude-3-5-sonnet-20240620']
timeout=5,
default="openai/gpt-4o-2024-05-13"
)
const result = await notDiamond.create({
messages: [
{ role: 'system', content: 'You are a world class programmer.' },
{ role: 'user', content: 'Consiely explain merge sort.' } // Adjust as desired
],
llmProviders: [
{ provider: 'openai', model: 'gpt-3.5-turbo' },
{ provider: 'openai', model: 'gpt-4o' },
{ provider: 'anthropic', model: 'claude-3-5-sonnet-20240620' }
],
timeout: 5,
default: 'openai/gpt-4o'
});
The default value for timeout
is 5 seconds. If no default
LLM is defined, Not Diamond will automatically consider the first LLM specified in your list as the default model.
Custom routing fallback logic
If we want to use custom logic for defining fallbacks for our requests to specific LLMs, we can use Not Diamond to determine the best LLM to call using the model_select
method and then decide how we want to implement our API call logic and fallback behavior.
session_id, provider = client.chat.completions.model_select(
messages=[
{"role": "system", "content": "You are a world class programmer."},
{"role": "user", "content": "Write a merge sort in Python. Be as concise as possible."},
],
model=['openai/gpt-3.5-turbo', 'openai/gpt-4o', 'anthropic/claude-3-5-sonnet-20240620']
)
from openai import OpenAI
openai_client = OpenAI(api_key="OPENAI_API_KEY")
max_retries = 3
if provider.model == "gpt-3.5-turbo":
for _ in range(max_retries):
try:
chat_completion = openai_client.chat.completions.create(
messages=[
{
"role": "user",
"content": prompt_template.format(),
}
],
model="gpt-3.5-turbo",
)
return chat_completion.choices[0].message.content
except:
continue
import { NotDiamond } from 'notdiamond';
import { OpenAI } from 'openai';
import dotenv from 'dotenv';
dotenv.config();
// Initialize the Not Diamond client
const notDiamond = new NotDiamond({apiKey: process.env.NOTDIAMOND_API_KEY});
// The best LLM is determined by Not Diamond based on the messages and specified models
const result = await notDiamond.modelSelect({
messages: [
{ role: 'system', content: 'You are a world class programmer.' },
{ role: 'user', content: 'Consiely explain merge sort.' } // Adjust as desired
],
llmProviders: [
{ provider: 'openai', model: 'gpt-3.5-turbo' },
{ provider: 'openai', model: 'gpt-4o' },
{ provider: 'anthropic', model: 'claude-3-5-sonnet-20240620' }
],
tradeoff: "cost"
});
if ('detail' in result) {
console.error('Error:', result.detail);
}
else {
console.log('Not Diamond session ID:', result.session_id); // A unique ID of Not Diamond's recommendation
console.log('LLM called:', result.providers); // The LLM routed to
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const maxRetries = 3;
const provider = result.providers[0];
let finalResult = null;
if (provider.model === 'gpt-3.5-turbo') {
for (let i = 0; i < maxRetries; i++) {
try {
const completion = await openai.chat.completions.create({
messages: [
{
role: 'user',
content: 'Write a merge sort in Python. Be as concise as possible.',
}
],
model: 'gpt-3.5-turbo',
});
finalResult = completion.choices[0];
console.log('Response:', finalResult);
break;
} catch {
continue;
}
}
}
}
Reliability toolkit with notdiamond.init
notdiamond.init
Model providers may experience outages, return errors, or struggle to serve requests at the throughput we require. To help avoid downtime in our applications and effectively load-balance, Not Diamond offers a reliability toolkit via notdiamond.init
which can be used via a simple one-line statement.
More providers coming soon
At this time, the reliability toolkit is only available in our Python SDK and compatible with workflows which use
OpenAI
orAzureOpenAI
clients (or the async versions).If you would like to request support to our TypeScript SDK or other providers, please reach reach out to us and we'll work to accommodate your request.
Installation
Start by installing notdiamond
alongside the openai
extra:
pip install 'notdiamond[openai]'
If you have already installed notdiamond
, please ensure you're using 0.3.34 or greater:
pyenv activate notdiamond-python
poetry version # should show notdiamond 0.3.34 or greater
poetry show openai # should show openai is installed
Usage
openai_client = OpenAI()
azure_client = AzureOpenAI()
init(
client=[openai_client, azure_client],
models=["azure/gpt-4o-mini", "openai/gpt-4o-mini", "azure/gpt-4o"],
max_retries={
'azure/gpt-4o-mini': 3,
'openai/gpt-4o-mini': 1,
"azure/gpt-4o": 1
},
timeout={
'azure/gpt-4o-mini': 5.0,
'openai/gpt-4o-mini': 5.0,
"azure/gpt-4o", 10.0
},
model_messages={
"azure/gpt-4o-mini": [{"role": "user", "content": "Respond to the question."}],
"openai/gpt-4o-mini": [{"role": "user", "content": "Respond to the question."}],
"azure/gpt-4o": [{"role": "user", "content": "Respond to the question as concisely as possible."}]
},
backoff={
'azure/gpt-4o-mini': 1.0,
'openai/gpt-4o-mini': 2.0,
"azure/gpt-4o", 1.5,
},
)
Let's walk through the keyword arguments of init
:
client
is either anOpenAI
client or an iterable of them,models
defines the order in which to fall back to other models when any invocation failsmax_retries
can be configured per-model (as shown above) or globally (using a singleint
)timeout
can be configured per-model (similar tomax_retries
) or globally using afloat
model_messages
accepts a map from model name to OpenAI-like messages, which will be appended to any model invoked bynotdiamond.init
backoff
can be configured to use an exponential backoff for each retried request, globally or per-model
Load balancing
We can also optionally configure init
to load balance across various models and providers:
init(
client=[openai_client, azure_client],
models={
"azure/gpt-4o-mini": 0.4,
"openai/gpt-4o-mini": 0.4,
"azure/gpt-4o": 0.2
},
)
If the call to azure/gpt-4o
fails, we will load balance fallback requests across azure/gpt-4o-mini
and openai/gpt-4o-mini
with equal probability. Of course, init
will ignore the failed model (azure/gpt-4o
) when load balancing.
notdiamond.init
example
notdiamond.init
exampleImagine you have this simple workflow. It first prompts GPT-4o mini hosted on Microsoft Azure, then performs some other operations, and finishes by prompting GPT-4o mini hosted by OpenAI.
We'll introduce one wrinkle: our Azure client has an incorrect API key.
openai_client = OpenAI()
flaky_azure_client = AzureOpenAI(api_key="incorrect-api-key")
flaky_azure_client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello there flaky client. Are you working?"}],
)
When attempting to execute this workflow, we will see a 401 authorization error:
openai.AuthenticationError: Error code: 401 -
{
'statusCode': 401,
'message': 'Unauthorized. Access token is missing, invalid, audience is incorrect (https://cognitiveservices.azure.com), or have expired.'
}
We could add error-handling to each LLM invocation in our application, but that introduces significant amounts of boilerplate to an otherwise-simple workflow. Instead, let's use notdiamond.init
:
openai_client = OpenAI()
flaky_azure_client = AzureOpenAI(api_key="incorrect-api-key")
init(
client=[openai_client, flaky_azure_client],
models=["azure/gpt-4o-mini", "openai/gpt-4o-mini"],
max_retries={
'azure/gpt-4o-mini': 3,
'openai/gpt-4o-mini': 1,
}
)
print(
"Azure fallback response: " +
flaky_azure_client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello there flaky client. Are you working?"}],
).choices[0].message.content
)
This workflow will now recover from the 401 by invoking openai/gpt-4o-mini
.
notdiamond.toolkit._retry._RetryWrapperException: Failed to invoke ['azure/gpt-4o-mini']:
openai.AuthenticationError: Error code: 401 -
{
'statusCode': 401,
'message': 'Unauthorized. Access token is missing, invalid, audience is incorrect (https://cognitiveservices.azure.com), or have expired.'
}
Azure fallback response: Hello! Yes, I'm here and ready to assist you. How can I help you today?
We've now successfully mitigated the risk of downtime in our application. For more information about notdiamond.init
please see the API reference.
Updated 29 days ago