Question answering

👍

Prompt adaptation is in private beta

Please reach out to us if you want to test it out.

Migrating applications from one LLM to another requires extensive, tedious prompt engineering to avoid performance degradation. Not Diamond can help you automatically adapt your prompts from the original model to a new target LLM.

In order to adapt prompts from one model to another, you will need the following:

  • A Not Diamond API key.
  • Your original prompt
  • An evaluation dataset and metric for measuring the quality of LLM responses

The example below shows how we can adapt a RAG workflow on the hotpotqa dataset to work optimally on faster models like OpenAI GPT-5 Mini and Google Gemini 2.5 Flash.

Setup

First, we will download hotpotqa dataset.

wget "https://drive.google.com/uc?export=download&id=1TeXM_Z-F3-o6ouooigaEWT3axUJA65kv" -O hotpotqa.jsonl

Then install dependencies

pip install notdiamond pandas
npm install notdiamond

Set your Not Diamond API key

export NOTDIAMOND_API_KEY=YOUR_NOTDIAMOND_API_KEY

Next, we define the system prompt and prompt template for our current workflow.

system_prompt = """I'd like for you to answer questions about a context text that will be provided. I'll give you a pair
with the form:
Context: "context text"
Question: "a question about the context"
Generate an explicit answer to the question that will be output. Make sure that the answer is the only output you provide,
and the analysis of the context should be kept to yourself. Answer directly and do not prefix the answer with anything such as
"Answer:" nor "The answer is:". The answer has to be the only output you explicitly provide.

The answer has to be as short, direct, and concise as possible. If the answer to the question can not be obtained from the provided
context paragraph, output "UNANSWERABLE". Here's the context and question for you to reason about and answer.
"""

prompt_template = """
Context: {context}

Question: {question}
"""

Evaluation metrics

Our prompt adaptation tool will optimize prompts against one of several possible metric parameters. For a comprehensive list of available evaluation metrics and how to configure custom metrics, see Evaluation Metrics.

Request prompt adaptation

First we will format the dataset for prompt adaptation. Not Diamond expects a list of samples consisting of prompt template field arguments, so ensure that prompt_template.format(sample['fields']) returns a valid user prompt for each sample.

def load_json_dataset(dataset_path: str, n_samples: int = 200) -> tuple[list[str], list[dict]]:
    df = pd.read_json(dataset_path, lines=True)
    n_samples = min(n_samples, len(df))
    if n_samples:
        df = df.iloc[[v for v in range(n_samples)]]

    fields: list[str] = ["question", "context"]

    golden_dataset = []
    for idx, row in df.iterrows():
        sample_fields = {"question": row["question"]}
        sample_fields["context"] = "\n\n".join(row["documents"])
        answer = row["response"]

        data_sample = dict(
            fields=sample_fields,
            answer=answer,
        )
        golden_dataset.append(data_sample)
    return fields, golden_dataset

fields, pa_ds = load_json_dataset("hotpotqa.jsonl", 25)

print(prompt_template.format(**pa_ds[0]['fields']))

Next, specify your target_models, which you would like to optimize prompts for. You can list multiple target models.

target_models = [
    {"provider": "openai", "model": "openai/gpt-5-mini-2025-08-07"},
    {"provider": "google", "model": "google/gemini-2.5-flash"},
]

Finally, call the API and the adaptation request will be submitted to Not Diamond's servers. You will get back an adaptation_run_id.

import os

from notdiamond import NotDiamond

client = NotDiamond(api_key=os.environ.get("NOTDIAMOND_API_KEY"))

response = client.prompt_adaptation.adapt(
    system_prompt=system_prompt,
    template=prompt_template,
    fields=fields,
    goldens=pa_ds,
    target_models=target_models,
    evaluation_metric="LLMaaJ:Sem_Sim_1"
)
print(response.adaptation_run_id)
import { NotDiamond } from 'notdiamond';

const client = new NotDiamond({api_key: process.env.NOTDIAMOND_API_KEY});

// Note: Prepare your dataset similar to the Python example above
// Define systemPrompt, promptTemplate, fields, goldenData, and targetModels

const response = await client.promptAdaptation.adapt({
  systemPrompt: systemPrompt,
  template: promptTemplate,
  fields: fields,
  goldens: goldenData,  // Your prepared dataset
  targetModels: targetModels,
  evaluationMetric: 'LLMaaJ:Sem_Sim_1'
});
console.log(response.adaptationRunId);

Retrieve results

Once the prompt adaptation job completes, you can retrieve the optimized prompts and evaluation metrics.

# Check job status
status = client.prompt_adaptation.get_adapt_status(adaptation_run_id)
print(f"Status: {status.status}")

# When status is 'completed', fetch results
results = client.prompt_adaptation.get_adapt_results(adaptation_run_id)

# View optimized prompts
for target in results.target_models:
    print(f"\nModel: {target.model_name}")
    print(f"Pre-optimization score: {target.pre_optimization_score}")
    print(f"Post-optimization score: {target.post_optimization_score}")
    print(f"\nOptimized system prompt:\n{target.system_prompt}")
    print(f"\nOptimized template:\n{target.user_message_template}")
// Check job status
const status = await client.promptAdaptation.getAdaptStatus(adaptationRunId);
console.log(`Status: ${status.status}`);

// When status is 'completed', fetch results
const results = await client.promptAdaptation.getAdaptResults(adaptationRunId);

// View optimized prompts
for (const target of results.targetModels) {
  console.log(`\nModel: ${target.modelName}`);
  console.log(`Pre-optimization score: ${target.preOptimizationScore}`);
  console.log(`Post-optimization score: ${target.postOptimizationScore}`);
  console.log(`\nOptimized system prompt:\n${target.systemPrompt}`);
  console.log(`\nOptimized template:\n${target.userMessageTemplate}`);
}

Response format

The API response contains detailed information about the optimization results:

{
  "id": "uuid", // The prompt adaptation id
  "created_at": "datetime", // Timestamp
  "target_models": [
    {
      "model_name": "openai/gpt-5-mini-2025-08-07", // The target model the prompt was optimized for
      "pre_optimization_score": 0.64, // The target model's score on the dataset before optimization
      "pre_optimization_evals": {"LLMaaJ:Sem_Sim_1": 0.64}, // The target model's evaluation results on the dataset before optimization
      "post_optimization_score": 0.8, // The target model's score on the dataset after optimization
      "post_optimization_evals": {"LLMaaJ:Sem_Sim_1": 0.8}, // The target model's evaluation results on the dataset after optimization
      "system_prompt": "...", // The optimized system prompt
      "user_message_template": "...", // The optimized prompt template
      "user_message_template_fields": ["..."], // Field arguments in the user_message_template
      "result_status": "completed"
    },
    {
      "model_name": "google/gemini-2.5-flash", // The target model the prompt was optimized for
      "pre_optimization_score": 0.62, // The target model's score on the dataset before optimization
      "pre_optimization_evals": {"LLMaaJ:Sem_Sim_1": 0.62}, // The target model's evaluation results on the dataset before optimization
      "post_optimization_score": 0.78, // The target model's score on the dataset after optimization
      "post_optimization_evals": {"LLMaaJ:Sem_Sim_1": 0.78}, // The target model's evaluation results on the dataset after optimization
      "system_prompt": "...", // The optimized system prompt
      "user_message_template": "...", // The optimized prompt template
      "user_message_template_fields": ["..."], // Field arguments in the user_message_template
      "result_status": "completed"
    }
  ],
}

result_status can have one of the following statuses:

  • created: the optimization job has been received.
  • queued: the optimization job is currently in queue to be processed.
  • processing: the optimization job is currently running. Evaluation scores will be null until the job is completed.
  • completed: the optimization job is finished and you will see the evaluation scores populated.
  • failed: the optimization job failed, please try again or contact support.

Each model in target_models will have its own results dictionary. If an adaptation failed for a specific target model, please try again or contact support.

View your prompt adaptation requests and results

You can also use the dashboard to view your runs, their status, and copy the optimized prompts directly.