Key concepts

Before you start using prompt adaptation, familiarize yourself with these key concepts and requirements.

Golden examples

Golden examples (or "goldens") are reference data samples that include inputs and their expected correct outputs. These serve as ground truth for evaluating how well your prompts perform.

A golden example consists of:

  • Fields - The input variables that populate your prompt template (e.g., question, context)
  • Answer - The expected correct output for this input

Example:

{
    "fields": {
        "question": "What is the capital of France?",
        "context": "France is a country in Western Europe..."
    },
    "answer": "Paris"
}

Goldens can be provided in either of the following two ways:

  • All together via the goldens parameter
  • Separated into train_goldens and test_goldens (recommended)

If submitting via the goldens parameter, our algorithm will leverage a small subset of the goldens as the train set and then use the entire dataset for final test-time evaluation.

Dataset requirements

Minimum samples: You must provide at least 25 golden examples for prompt adaptation to work effectively. If you have prototype_mode enabled, you can provide as few as 3 samples.

Maximum samples: You can submit up to 200 samples per request. Larger datasets result in longer processing times but may improve optimization quality.

Format: Each sample must include:

  • A fields dictionary with values for each variable in your prompt template
  • An answer string with the expected output

Fields

Fields are the named variables in your prompt template that get replaced with actual values. They must match between your template and golden data.

Example:

# Your template has two fields: {context} and {question}
prompt_template = """
Context: {context}
Question: {question}
"""

# Your fields list must match
fields = ["context", "question"]

# Each golden example must provide values for these fields
golden_data = [
    {
        "fields": {
            "context": "...",
            "question": "..."
        },
        "answer": "..."
    }
]

Origin model vs target models

Origin model: The LLM your current prompt was designed for. This is an optional parameter which will evaluate your original prompt agains your origin model as a baseline for comparison.

Target models: The LLM(s) you want to migrate to. You can specify up to 4 target models in a single request. If you would like to be able to specify more target models, please feel free to reach out to our team.

Example:

origin_model = {"provider": "google", "model": "gemini-2.5-pro"}

target_models = [
    {"provider": "anthropic", "model": "claude-sonnet-4-5-20250929"},
    {"provider": "openai", "model": "gpt-5-2025-08-07"}
]

See Supported Models for the full list of available models.

Evaluation metrics

Evaluation metrics determine how prompt quality is measured during optimization. You must specify either:

Option 1: Use a predefined metric

"evaluation_metric": "LLMaaJ:Sem_Sim_1"  # Default semantic similarity

Option 2: Define a custom metric

"evaluation_config": {
    "llm_judging_prompt": "Your custom judging prompt with {question} and {answer}",
    "llm_judge": "openai/gpt-5-2025-08-07",
    "correctness_cutoff": 0
}

See Evaluation Metrics for detailed information on available metrics.

Job limits and processing time

Concurrency: Currently limited to 1 job per user at a time. You can include multiple target models in a single job. If you would like to be able to increase your concurrency limits, please feel free to reach out to our team.

Processing time: Jobs typically take several minutes to complete, depending on:

  • Number of samples (25-200)
  • Number of target models (1-4)
  • Complexity of the prompts

You'll receive an adaptation_run_id immediately, which you can use to check job status and retrieve results once complete.

What you'll need

To use prompt adaptation, prepare:

  1. A Not Diamond API key
  2. Your current system prompt and prompt template
  3. At least 25 golden examples in the correct format
  4. Target model specifications
  5. An evaluation metric (or use the default)

Once you have these ready, proceed to the quickstart to make your first API call.