Classification

👍

Prompt adaptation is in private beta

Please reach out to us if you want to test it out.

In this example, we'll show you how to use the prompt adaptation API on classification tasks. Specifically, we'll use the PolyAI/banking77 dataset for this tutorial—a dataset of banking customer support queries that need to be classified into 77 different intent categories.

Setup

First, we will install dependencies and download the banking77 dataset from Hugging Face.

pip install notdiamond datasets==3.6.0
npm install notdiamond
# Note: You'll need to download and prepare the dataset separately

Set your Not Diamond API key

export NOTDIAMOND_API_KEY=YOUR_NOTDIAMOND_API_KEY

Download the dataset

from datasets import load_dataset

n_samples = 25

ds = load_dataset("PolyAI/banking77")["train"].select(list(range(n_samples)))

Next, we define the system prompt and prompt template for our current workflow.

system_prompt = """
You are a helpful assistant that categorizes banking-related questions provided by the user.
"""

prompt_template = """
Sharing a text from banking domain which needs to be classified into one of the 77 classes mentioned below
The text is customer support query.

Categories to classify the data :

{categories}

Text To classify : {question}
"""

Request prompt adaptation

First we will format the dataset for prompt adaptation. Not Diamond expects a list of samples consisting of prompt template field arguments, so ensure that prompt_template.format(**sample['fields']) returns a valid user prompt for each sample. For classification tasks, it is important that the "answer" is the ground truth class name of the sample.

import json

categories = [  # pre-extracted for convenience
    "activate_my_card",
    "age_limit",
    "apple_pay_or_google_pay",
    "atm_support",
    "automatic_top_up",
    "balance_not_updated_after_bank_transfer",
    "balance_not_updated_after_cheque_or_cash_deposit",
    "beneficiary_not_allowed",
    "cancel_transfer",
    "card_about_to_expire",
    "card_acceptance",
    "card_arrival",
    "card_delivery_estimate",
    "card_linking",
    "card_not_working",
    "card_payment_fee_charged",
    "card_payment_not_recognised",
    "card_payment_wrong_exchange_rate",
    "card_swallowed",
    "cash_withdrawal_charge",
    "cash_withdrawal_not_recognised",
    "change_pin",
    "compromised_card",
    "contactless_not_working",
    "country_support",
    "declined_card_payment",
    "declined_cash_withdrawal",
    "declined_transfer",
    "direct_debit_payment_not_recognised",
    "disposable_card_limits",
    "edit_personal_details",
    "exchange_charge",
    "exchange_rate",
    "exchange_via_app",
    "extra_charge_on_statement",
    "failed_transfer",
    "fiat_currency_support",
    "get_disposable_virtual_card",
    "get_physical_card",
    "getting_spare_card",
    "getting_virtual_card",
    "lost_or_stolen_card",
    "lost_or_stolen_phone",
    "order_physical_card",
    "passcode_forgotten",
    "pending_card_payment",
    "pending_cash_withdrawal",
    "pending_top_up",
    "pending_transfer",
    "pin_blocked",
    "receiving_money",
    "refund_not_showing_up",
    "request_refund",
    "reverted_card_payment",
    "supported_cards_and_currencies",
    "terminate_account",
    "top_up_by_bank_transfer_charge",
    "top_up_by_card_charge",
    "top_up_by_cash_or_cheque",
    "top_up_failed",
    "top_up_limits",
    "top_up_reverted",
    "topping_up_by_card",
    "transaction_charged_twice",
    "transfer_fee_charged",
    "transfer_into_account",
    "transfer_not_received_by_recipient",
    "transfer_timing",
    "unable_to_verify_identity",
    "verify_my_identity",
    "verify_source_of_funds",
    "verify_top_up",
    "virtual_card_not_working",
    "visa_or_mastercard",
    "why_verify_identity",
    "wrong_amount_of_cash_received",
    "wrong_exchange_rate_for_cash_withdrawal"
]

fields = ["question", "categories"]

pa_ds = [
    {
        "fields": {
            "question": sample["text"],
            "categories": json.dumps(categories)
        },
        "answer": json.dumps({
            "intent": categories[sample["label"]], # The intent here should be the class name
        })
    }
    for sample in ds
]

print(prompt_template.format(**pa_ds[0]['fields']))

Next, specify the origin_model which you query with the current system prompt and prompt template; and your target_models, which you would like to query with adapted prompts. You can list multiple target models.

origin_model = {"provider": "openai", "model": "gpt-4o-2024-08-06"}
target_models = [
    {"provider": "anthropic", "model": "claude-sonnet-4-20250514"},
]

Finally, call the API and the adaptation request will be submitted to Not Diamond's servers. You will get back an adaptation_run_id.

import os

from notdiamond import NotDiamond

client = NotDiamond(api_key=os.environ.get("NOTDIAMOND_API_KEY"))

response = client.prompt_adaptation.adapt(
    system_prompt=system_prompt,
    template=prompt_template,
    fields=fields,
    goldens=pa_ds,
    origin_model=origin_model,
    target_models=target_models,
    evaluation_metric="LLMaaJ:Sem_Sim_1"
)

adaptation_run_id = response.adaptation_run_id
print(f"Adaptation job started: {adaptation_run_id}")
import { NotDiamond } from 'notdiamond';

const client = new NotDiamond({api_key: process.env.NOTDIAMOND_API_KEY});

// Note: Prepare your dataset similar to the Python example above
// Define systemPrompt, promptTemplate, fields, goldenData, originModel, and targetModels

const response = await client.promptAdaptation.adapt({
  systemPrompt: systemPrompt,
  template: promptTemplate,
  fields: fields,
  goldens: goldenData,  // Your prepared dataset
  originModel: originModel,
  targetModels: targetModels,
  evaluationMetric: 'LLMaaJ:Sem_Sim_1'
});

const adaptationRunId = response.adaptationRunId;
console.log(`Adaptation job started: ${adaptationRunId}`);

Retrieve results

Once the prompt adaptation job completes, you can retrieve the optimized prompts and evaluation metrics.

# Check job status
status = client.prompt_adaptation.get_adapt_status(adaptation_run_id)
print(f"Status: {status.status}")

# When status is 'completed', fetch results
results = client.prompt_adaptation.get_adapt_results(adaptation_run_id)

# View optimized prompts
for target in results.target_models:
    print(f"\nModel: {target.model_name}")
    print(f"Pre-optimization score: {target.pre_optimization_score}")
    print(f"Post-optimization score: {target.post_optimization_score}")
    print(f"\nOptimized system prompt:\n{target.system_prompt}")
    print(f"\nOptimized template:\n{target.user_message_template}")
// Check job status
const status = await client.promptAdaptation.getAdaptStatus(adaptationRunId);
console.log(`Status: ${status.status}`);

// When status is 'completed', fetch results
const results = await client.promptAdaptation.getAdaptResults(adaptationRunId);

// View optimized prompts
for (const target of results.targetModels) {
  console.log(`\nModel: ${target.modelName}`);
  console.log(`Pre-optimization score: ${target.preOptimizationScore}`);
  console.log(`Post-optimization score: ${target.postOptimizationScore}`);
  console.log(`\nOptimized system prompt:\n${target.systemPrompt}`);
  console.log(`\nOptimized template:\n${target.userMessageTemplate}`);
}

Response format

The API response contains detailed information about the optimization results:

{
  "id": "uuid", // The prompt adaptation id
  "created_at": "datetime", // Timestamp
  "origin_model": {
    "model_name": "openai/gpt-4o-2024-08-06", // The original model the prompt was designed for
    "score": 0.8, // The original model's score on the dataset before optimization
    "evals": {"LLMaaJ:Sem_Sim_1": 0.8}, // The original model's evaluation results on the dataset
    "system_prompt": "...", // The baseline system prompt submitted
    "user_message_template": "...", // The baseline prompt template submitted
    "result_status": "completed" 
  },
  "target_models": [
    {
      "model_name": "anthropic/claude-sonnet-4-20250514", // The target model
      "pre_optimization_score": 0.64, // The target model's score on the dataset before optimization
      "pre_optimization_evals": {"LLMaaJ:Sem_Sim_1": 0.64}, // The target model's evaluation results on the dataset before optimization
      "post_optimization_score": 0.8, // The target model's score on the dataset after optimization
      "post_optimization_evals": {"LLMaaJ:Sem_Sim_1": 0.8}, // The target model's evaluation results on the dataset after optimization
      "system_prompt": "...", // The optimized system prompt
      "user_message_template": "...", // The optimized prompt template
      "user_message_template_fields": ["..."], // Field arguments in the user_message_template
      "result_status": "completed"
    }
  ],
}

result_status can have one of the following statuses:

  • created: the optimization job has been received.
  • queued: the optimization job is currently in queue to be processed.
  • processing: the optimization job is currently running. Evaluation scores will be null until the job is completed.
  • completed: the optimization job is finished and you will see the evaluation scores populated.
  • failed: the optimization job failed, please try again or contact support.

Each model in target_models will have its own results dictionary. If an adaptation failed for a specific target model, please try again or contact support.