Skip to main content

LLM Models & OpenAI Client Compatibility

RunGen.AI provides an OpenAI-compatible API for interacting with Large Language Models (LLMs). This allows users to integrate their applications using the familiar OpenAI Python SDK while leveraging RunGen.AI’s infrastructure.

Connecting to RunGen.AI's LLM API

To use the API, set the base URL to RunGen.AI’s LLM endpoint for the deployed app and use your API key for authentication.

Required Headers

  • x-api-key: A valid API key for authentication.
  • x-team-id: (Optional) If the LLM model is deployed in a team environment.
  • Content-Type: application/json
from openai import OpenAI

client = OpenAI(
api_key="your-rungen-api-key",
base_url="https://api.rungen.ai/app/<APP-ID>/llm/v1"
)

Chat Completions (Streaming & Non-Streaming)

RunGen.AI supports chat-based LLM interactions using OpenAI’s chat.completions.create().

Streaming Chat Completion

Streaming allows real-time responses, where tokens are received as they are generated.

response_stream = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
messages=[
{"role": "system", "content": "You are a helpful librarian."},
{"role": "user", "content": "What can you do?"}
],
temperature=0,
max_tokens=100,
stream=True
)

for response in response_stream:
print(response.choices[0].delta.content or "", end="", flush=True)

Non-Streaming Chat Completion

For non-streaming requests, the full response is returned at once.

response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
messages=[
{"role": "system", "content": "You are a helpful librarian."},
{"role": "user", "content": "What can you do?"}
],
temperature=0,
max_tokens=100,
stream=False
)

print(response)

Text Completions (Prompt-Based)

RunGen.AI supports text completions using completions.create(), similar to OpenAI’s GPT models.

Streaming Text Completion

response_stream = client.completions.create(
model="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
prompt="What is the meaning of life?",
temperature=0,
max_tokens=50,
stream=True
)

for response in response_stream:
print(response.choices[0].text or "", end="", flush=True)

Non-Streaming Text Completion

response = client.completions.create(
model="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
prompt="What is the meaning of life?",
temperature=0,
max_tokens=50,
stream=False
)

print(response)

⚠️ Warning: Model-Specific Input & Output

  • Each LLM model has its own expected input structure and result format.
  • Users must verify their model’s requirements before making requests.
  • Outputs depend on the model type (text, images, embeddings, etc.).
  • Check the model’s documentation or API responses for details.