Skip to main content

Documentation Index

Fetch the complete documentation index at: https://narev.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Endpoint

POST /api/applications/{application_id}/v1/chat/completions

Authentication

Include your Narev API key in the Authorization header:
Authorization: Bearer YOUR_API_KEY
You can generate API keys in the Narev Cloud dashboard under Settings → API Keys.

Setup

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://narev.ai/api/applications/{application_id}/v1"
)

Request parameters

Required

messages
array
required
Array of message objects, each with a role (system, user, or assistant) and content string.

Optional

Parameter support varies by model. Check your model’s documentation to confirm which parameters it accepts.
model
string
Model identifier with gateway prefix (for example, openai:gpt-4). If omitted, Narev uses the A/B test’s production variant.
temperature
number
Sampling temperature between 0 and 2. Higher values produce more random output.
top_p
number
Nucleus sampling parameter between 0 and 1. Lower values make output more focused.
top_k
integer
Limits sampling to the K most likely next tokens.
max_tokens
integer
Maximum number of tokens to generate in the response.
frequency_penalty
number
Penalizes tokens based on their frequency in the text so far. Range: -2.0 to 2.0.
presence_penalty
number
Penalizes tokens that have already appeared in the text so far. Range: -2.0 to 2.0.
repetition_penalty
number
Penalizes repeated tokens. Typical range: 0.0 to 2.0.
min_p
number
Minimum probability threshold for token selection. Range: 0 to 1.
seed
integer
Random seed for deterministic generation.
logprobs
boolean
When true, returns log probabilities for each output token.
top_logprobs
integer
Number of top log probabilities to return. Range: 0 to 20. Requires logprobs: true.
response_format
object
Controls the format of the response. Pass {"type": "json_object"} to enable JSON mode.
stop
string | array
Up to four sequences at which the API stops generating further tokens.
stream
boolean
default:"false"
When true, Narev streams the response as server-sent events (SSE).
metadata
object
Custom metadata for tracking and automatic quality evaluation.
FieldTypeDescription
expected_outputstringExpected response text for automatic quality scoring

Model identifiers

Models use a {gateway}:{model_name} format:
GatewayExample
openaiopenai:gpt-4
anthropicanthropic:claude-3-opus-20240229
openrouteropenrouter:meta-llama/llama-3.1-70b-instruct
vertexvertex:gemini-pro
bedrockbedrock:amazon.titan-text-express-v1
portkeyportkey:gpt-4
heliconehelicone:gpt-4

Request examples

Basic request

response = client.chat.completions.create(
    model="openai:gpt-4",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

With system prompt

response = client.chat.completions.create(
    model="openai:gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful geography expert."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

With generation parameters

response = client.chat.completions.create(
    model="openai:gpt-4",
    messages=[
        {"role": "user", "content": "Write a creative story."}
    ],
    temperature=0.9,
    max_tokens=500,
    top_p=0.95
)

With JSON response format

response = client.chat.completions.create(
    model="openai:gpt-4",
    messages=[
        {"role": "user", "content": "Return user data as JSON."}
    ],
    response_format={"type": "json_object"}
)

Streaming

stream = client.chat.completions.create(
    model="openai:gpt-4",
    messages=[
        {"role": "user", "content": "Tell me a story."}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

With quality evaluation

response = client.chat.completions.create(
    model="openai:gpt-4",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    extra_body={
        "metadata": {
            "expected_output": "Paris is the capital of France."
        }
    }
)

Response format

Non-streaming

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "openai:gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Paris is the capital of France."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 7,
    "total_tokens": 20
  }
}

Streaming

Narev sends each token as a server-sent event (SSE) with a data: prefix:
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"openai:gpt-4","choices":[{"index":0,"delta":{"content":"Paris"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"openai:gpt-4","choices":[{"index":0,"delta":{"content":" is"},"finish_reason":null}]}

data: [DONE]

Error responses

All errors return a JSON object with an error field:
{
  "error": {
    "message": "Error description",
    "code": "error_code"
  }
}
StatusCodeDescription
400bad_requestInvalid request format or parameters
400model_requiredModel is required when no production variant is set
401invalid_api_keyInvalid or missing API key
402insufficient_creditsInsufficient credits to complete the request
404application_not_foundA/B test ID not found
500internal_errorInternal server error