LLM Models & OpenAI Client Compatibility
RunGen.AI provides an OpenAI-compatible API for interacting with Large Language Models (LLMs). This allows users to integrate their applications using the familiar OpenAI Python SDK while leveraging RunGen.AI’s infrastructure.
Connecting to RunGen.AI's LLM API
To use the API, set the base URL to RunGen.AI’s LLM endpoint for the deployed app and use your API key for authentication.
Required Headers
x-api-key
: A valid API key for authentication.x-team-id
: (Optional) If the LLM model is deployed in a team environment.Content-Type
:application/json
- Python
from openai import OpenAI
client = OpenAI(
api_key="your-rungen-api-key",
base_url="https://api.rungen.ai/app/<APP-ID>/llm/v1"
)
Chat Completions (Streaming & Non-Streaming)
RunGen.AI supports chat-based LLM interactions using OpenAI’s chat.completions.create()
.
Streaming Chat Completion
Streaming allows real-time responses, where tokens are received as they are generated.
- Python
response_stream = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
messages=[
{"role": "system", "content": "You are a helpful librarian."},
{"role": "user", "content": "What can you do?"}
],
temperature=0,
max_tokens=100,
stream=True
)
for response in response_stream:
print(response.choices[0].delta.content or "", end="", flush=True)
Non-Streaming Chat Completion
For non-streaming requests, the full response is returned at once.
- Python
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
messages=[
{"role": "system", "content": "You are a helpful librarian."},
{"role": "user", "content": "What can you do?"}
],
temperature=0,
max_tokens=100,
stream=False
)
print(response)
Text Completions (Prompt-Based)
RunGen.AI supports text completions using completions.create()
, similar to OpenAI’s GPT models.
Streaming Text Completion
- Python
response_stream = client.completions.create(
model="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
prompt="What is the meaning of life?",
temperature=0,
max_tokens=50,
stream=True
)
for response in response_stream:
print(response.choices[0].text or "", end="", flush=True)
Non-Streaming Text Completion
- Python
response = client.completions.create(
model="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
prompt="What is the meaning of life?",
temperature=0,
max_tokens=50,
stream=False
)
print(response)
⚠️ Warning: Model-Specific Input & Output
- Each LLM model has its own expected
input
structure andresult
format. - Users must verify their model’s requirements before making requests.
- Outputs depend on the model type (text, images, embeddings, etc.).
- Check the model’s documentation or API responses for details.