In this section we will learn how to use Not Diamond to improve the reliability and uptime of our application through the following methods:

Falling back to a default model if Not Diamond fails to return a response
Defining custom fallback logic for our router
Leveraging Not Diamond's reliability and load-balancing toolkit for openai clients

Falling back to a default model if Not Diamond fails to return a response

Because Not Diamond is not a proxy, we can eliminate the risk of disruptions if Not Diamond ever fails to return a response. We can define a timeout for how many seconds to wait for a model recommendation from Not Diamond's API, and we can configure a fallback model as default in case of error or timeout. The default parameter is of type string and represents the specific model from the llm_providers list we want to use as a fallback.

result, session_id, provider = client.chat.completions.create(
    messages=[ 
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Consiely explain merge sort."}  # Adjust as desired
    ],
    model=['openai/gpt-3.5-turbo', 'openai/gpt-4o', 'anthropic/claude-3-5-sonnet-20240620']
    timeout=5,
    default="openai/gpt-4o-2024-05-13"
)

const result = await notDiamond.create({
  messages: [
    { role: 'system', content: 'You are a world class programmer.' },
    { role: 'user', content: 'Consiely explain merge sort.' }  // Adjust as desired
  ],
  llmProviders: [
    { provider: 'openai', model: 'gpt-3.5-turbo' },
    { provider: 'openai', model: 'gpt-4o' },
    { provider: 'anthropic', model: 'claude-3-5-sonnet-20240620' }
  ],
  timeout: 5,
  default: 'openai/gpt-4o'
});

The default value for timeout is 5 seconds. If no default LLM is defined, Not Diamond will automatically consider the first LLM specified in your list as the default model.

Custom routing fallback logic

If we want to use custom logic for defining fallbacks for our requests to specific LLMs, we can use Not Diamond to determine the best LLM to call using the model_select method and then decide how we want to implement our API call logic and fallback behavior.

session_id, provider = client.chat.completions.model_select(
    messages=[
        {"role": "system", "content": "You are a world class programmer."},
        {"role": "user", "content": "Write a merge sort in Python. Be as concise as possible."},
    ],
  	model=['openai/gpt-3.5-turbo', 'openai/gpt-4o', 'anthropic/claude-3-5-sonnet-20240620']
)

from openai import OpenAI

openai_client = OpenAI(api_key="OPENAI_API_KEY")

max_retries = 3

if provider.model == "gpt-3.5-turbo":
    for _ in range(max_retries):
        try:
            chat_completion = openai_client.chat.completions.create(
                messages=[
                    {
                        "role": "user",
                        "content": prompt_template.format(),
                    }
                ],
                model="gpt-3.5-turbo",
            )
            return chat_completion.choices[0].message.content
        except:
            continue

import { NotDiamond } from 'notdiamond';
import { OpenAI } from 'openai';
import dotenv from 'dotenv';
dotenv.config();

// Initialize the Not Diamond client
const notDiamond = new NotDiamond({apiKey: process.env.NOTDIAMOND_API_KEY});

// The best LLM is determined by Not Diamond based on the messages and specified models
const result = await notDiamond.modelSelect({
  messages: [
    { role: 'system', content: 'You are a world class programmer.' },
    { role: 'user', content: 'Consiely explain merge sort.' }  // Adjust as desired
  ],
  llmProviders: [
    { provider: 'openai', model: 'gpt-3.5-turbo' },
    { provider: 'openai', model: 'gpt-4o' },
    { provider: 'anthropic', model: 'claude-3-5-sonnet-20240620' }
  ],
  tradeoff: "cost"
});

if ('detail' in result) {
  console.error('Error:', result.detail);
} 
else {
  console.log('Not Diamond session ID:', result.session_id);  // A unique ID of Not Diamond's recommendation
  console.log('LLM called:', result.providers);  // The LLM routed to
    

    const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
    const maxRetries = 3;

    const provider = result.providers[0];

    let finalResult = null;

    if (provider.model === 'gpt-3.5-turbo') {
    for (let i = 0; i < maxRetries; i++) {
        try {
        const completion = await openai.chat.completions.create({
            messages: [
            {
                role: 'user',
                content: 'Write a merge sort in Python. Be as concise as possible.',
            }
            ],
            model: 'gpt-3.5-turbo',
        });
        finalResult = completion.choices[0];
        console.log('Response:', finalResult);
        break;
        } catch {
        continue;
        }
    }
    }
}

Reliability toolkit with `notdiamond.init`

Model providers may experience outages, return errors, or struggle to serve requests at the throughput we require. To help avoid downtime in our applications and effectively load-balance, Not Diamond offers a reliability toolkit via notdiamond.init which can be used via a simple one-line statement.

👍
Open-source and client-side
Our reliability toolkit is fully open-source and client-side!

Installation

Start by installing notdiamond alongside the openai extra:

pip install 'notdiamond[openai]'

If you have already installed notdiamond, please ensure you're using 0.3.34 or greater:

pyenv activate notdiamond-python
poetry version  # should show notdiamond 0.3.34 or greater
poetry show openai  # should show openai is installed

Usage

openai_client = OpenAI()
azure_client = AzureOpenAI()

init(
  client=[openai_client, azure_client],
  models=["azure/gpt-4o-mini", "openai/gpt-4o-mini", "azure/gpt-4o"],
  max_retries={
    'azure/gpt-4o-mini': 3,
    'openai/gpt-4o-mini': 1,
    "azure/gpt-4o": 1
  },
  timeout={
    'azure/gpt-4o-mini': 5.0,
    'openai/gpt-4o-mini': 5.0,
    "azure/gpt-4o": 10.0
  },
  model_messages={
    "azure/gpt-4o-mini": [{"role": "user", "content": "Respond to the question."}],
    "openai/gpt-4o-mini": [{"role": "user", "content": "Respond to the question."}],
    "azure/gpt-4o": [{"role": "user", "content": "Respond to the question as concisely as possible."}]
  },
  backoff={
    'azure/gpt-4o-mini': 1.0,
    'openai/gpt-4o-mini': 2.0,
    "azure/gpt-4o", 1.5,
  },
)

📘
More providers coming soon
At this time, the reliability toolkit is only available in our Python SDK and compatible with workflows which use OpenAI or AzureOpenAI clients (or the async versions).
If you would like to request support to our TypeScript SDK or other providers, please reach reach out to us and we'll work to accommodate your request.

Let's walk through the keyword arguments of init:

client is either an OpenAI client or an iterable of them,
models defines the order in which to fall back to other models when any invocation fails
max_retries can be configured per-model (as shown above) or globally (using a single int)
timeout can be configured per-model (similar to max_retries) or globally using a float
model_messages accepts a map from model name to OpenAI-like messages, which will be appended to any model invoked by notdiamond.init
backoff can be configured to use an exponential backoff for each retried request, globally or per-model

Load balancing

We can also optionally configure init to load balance across various models and providers:

init(
  client=[openai_client, azure_client],
  models={
    "azure/gpt-4o-mini": 0.4,
    "openai/gpt-4o-mini": 0.4,
    "azure/gpt-4o": 0.2
  },
)

If the call to azure/gpt-4o fails, we will load balance fallback requests across azure/gpt-4o-mini and openai/gpt-4o-mini with equal probability. Of course, init will ignore the failed model (azure/gpt-4o) when load balancing.

`notdiamond.init` example

Imagine you have this simple workflow. It first prompts GPT-4o mini hosted on Microsoft Azure, then performs some other operations, and finishes by prompting GPT-4o mini hosted by OpenAI.

We'll introduce one wrinkle: our Azure client has an incorrect API key.

openai_client = OpenAI()
flaky_azure_client = AzureOpenAI(api_key="incorrect-api-key")

flaky_azure_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello there flaky client. Are you working?"}],
)

When attempting to execute this workflow, we will see a 401 authorization error:

openai.AuthenticationError: Error code: 401 - 
{
		'statusCode': 401, 
    'message': 'Unauthorized. Access token is missing, invalid, audience is incorrect (https://cognitiveservices.azure.com), or have expired.'
}

We could add error-handling to each LLM invocation in our application, but that introduces significant amounts of boilerplate to an otherwise-simple workflow. Instead, let's use notdiamond.init:

openai_client = OpenAI()
flaky_azure_client = AzureOpenAI(api_key="incorrect-api-key")

init(
  client=[openai_client, flaky_azure_client],
  models=["azure/gpt-4o-mini", "openai/gpt-4o-mini"],
  max_retries={
    'azure/gpt-4o-mini': 3,
    'openai/gpt-4o-mini': 1,
  }
)

print(
    "Azure fallback response: " +
    flaky_azure_client.chat.completions.create(
      model="gpt-4o-mini",
      messages=[{"role": "user", "content": "Hello there flaky client. Are you working?"}],
  ).choices[0].message.content
)

This workflow will now recover from the 401 by invoking openai/gpt-4o-mini.

notdiamond.toolkit._retry._RetryWrapperException: Failed to invoke ['azure/gpt-4o-mini']: 
openai.AuthenticationError: Error code: 401 - 
{
		'statusCode': 401, 
    'message': 'Unauthorized. Access token is missing, invalid, audience is incorrect (https://cognitiveservices.azure.com), or have expired.'
}

Azure fallback response: Hello! Yes, I'm here and ready to assist you. How can I help you today?

We've now successfully mitigated the risk of downtime in our application. For more information about notdiamond.init please see the API reference.

Reliability, fallbacks, and load-balancing

Falling back to a default model if Not Diamond fails to return a response

Custom routing fallback logic

Reliability toolkit with `notdiamond.init`

👍
Open-source and client-side

Installation

Usage

📘
More providers coming soon

Load balancing

`notdiamond.init` example

Falling back to a default model if Not Diamond fails to return a response

Custom routing fallback logic

Reliability toolkit with notdiamond.init

👍Open-source and client-side

Installation

Usage

📘More providers coming soon

Load balancing

notdiamond.init example

Reliability toolkit with `notdiamond.init`

👍
Open-source and client-side

📘
More providers coming soon

`notdiamond.init` example