LLM Models & OpenAI Client Compatibility

RunGen.AI provides an OpenAI-compatible API for interacting with Large Language Models (LLMs). This allows users to integrate their applications using the familiar OpenAI Python SDK while leveraging RunGen.AI’s infrastructure.

Connecting to RunGen.AI's LLM API

To use the API, set the base URL to RunGen.AI’s LLM endpoint for the deployed app and use your API key for authentication.

Required Headers

x-api-key: A valid API key for authentication.
x-team-id: (Optional) If the LLM model is deployed in a team environment.
Content-Type: application/json

Python

from openai import OpenAI

client = OpenAI(
    api_key="your-rungen-api-key",
    base_url="https://api.rungen.ai/app/<APP-ID>/llm/v1"
)

Chat Completions (Streaming & Non-Streaming)

RunGen.AI supports chat-based LLM interactions using OpenAI’s chat.completions.create().

Streaming Chat Completion

Streaming allows real-time responses, where tokens are received as they are generated.

Python

response_stream = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    messages=[
        {"role": "system", "content": "You are a helpful librarian."},
        {"role": "user", "content": "What can you do?"}
    ],
    temperature=0,
    max_tokens=100,
    stream=True
)

for response in response_stream:
    print(response.choices[0].delta.content or "", end="", flush=True)

Non-Streaming Chat Completion

For non-streaming requests, the full response is returned at once.

Python

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    messages=[
        {"role": "system", "content": "You are a helpful librarian."},
        {"role": "user", "content": "What can you do?"}
    ],
    temperature=0,
    max_tokens=100,
    stream=False
)

print(response)

Text Completions (Prompt-Based)

RunGen.AI supports text completions using completions.create(), similar to OpenAI’s GPT models.

Streaming Text Completion

Python

response_stream = client.completions.create(
    model="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    prompt="What is the meaning of life?",
    temperature=0,
    max_tokens=50,
    stream=True
)

for response in response_stream:
    print(response.choices[0].text or "", end="", flush=True)

Non-Streaming Text Completion

Python

response = client.completions.create(
    model="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    prompt="What is the meaning of life?",
    temperature=0,
    max_tokens=50,
    stream=False
)

print(response)

⚠️ Warning: Model-Specific Input & Output

Each LLM model has its own expected input structure and result format.
Users must verify their model’s requirements before making requests.
Outputs depend on the model type (text, images, embeddings, etc.).
Check the model’s documentation or API responses for details.

Connecting to RunGen.AI's LLM API​

Required Headers​

Chat Completions (Streaming & Non-Streaming)​

Streaming Chat Completion​

Non-Streaming Chat Completion​

Text Completions (Prompt-Based)​

Streaming Text Completion​

Non-Streaming Text Completion​

⚠️ Warning: Model-Specific Input & Output​