Using Language Models

Once you've deployed a language model on RunGen.AI, you can interact with it via an OpenAI-compatible client or our Inference API (REST). This guide walks you through both methods.

1. Prerequisites

Before getting started, ensure you have:

A deployed Language Model on RunGen.AI
Your API key (See how to get your API key)
Your app ID (found in the dashboard under your deployed app)

2. Using the OpenAI Client

The easiest way to get started is by using our OpenAI-compatible API, allowing you to use the OpenAI compatible libraries.

This section includes code examples for the official OpenAI SDK for Python, but other SDKs (such as LangChain) or languages will also work.

Setting Up the OpenAI Client

from openai import OpenAI

client = OpenAI(
    api_key="your-rungen-api-key",
    base_url="https://api.rungen.ai/app/{app_id}/llm/v1"
)

Chat Completions (Streaming & Non-Streaming)

Using the SDK allows you to use streaming (i.e., real-time) or non-streaming completions (waiting until the response is finished). Depending on your needs, you can chose the one you prefer.

info

Note: Some models don't support chat completions. Text completions (next section) are going to work, regardless.

Streaming Response

response_stream = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a joke."}
    ],
    temperature=0.7,
    max_tokens=50,
    stream=True
)

for response in response_stream:
    print(response.choices[0].delta.content or "", end="", flush=True)

Non-Streaming Response

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a joke."}
    ],
    temperature=0.7,
    max_tokens=50,
    stream=False
)

print(response)

Text Completions (Non-Chat Models)

Some models do not support chat-based prompts and instead use direct text completion. This method follows OpenAI’s completions.create() function.

Streaming Text Completion

response_stream = client.completions.create(
    model="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    prompt="Explain quantum physics in simple terms.",
    temperature=0.7,
    max_tokens=100,
    stream=True
)

for response in response_stream:
    print(response.choices[0].text or "", end="", flush=True)

Non-Streaming Text Completion

response = client.completions.create(
    model="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    prompt="Explain quantum physics in simple terms.",
    temperature=0.7,
    max_tokens=100,
    stream=False
)

print(response)

3. Using the REST API

If you don't want to use an OpenAI-compatible client, you can prompt your deployed model by sending requests to the Run Job endpoint.

Endpoint

POST https://api.rungen.ai/app/{app_id}/run_async

Headers

Header	Description
`x-api-key`	Your API key for authentication
`Content-Type`	`application/json`

Request Body Example

{
  "input": {
    "messages": [
      {"role": "system", "content": "You are an AI assistant."},
      {"role": "user", "content": "How does AI work?"}
    ],
    "temperature": 0.7,
    "max_tokens": 100
  }
}

Example Request Using cURL

curl --location 'https://api.rungen.ai/app/{app_id}/run_async' \
--header 'x-api-key: your_api_key' \
--header 'Content-Type: application/json' \
--data '{
  "input": {
    "messages": [
      {"role": "system", "content": "You are an AI assistant."},
      {"role": "user", "content": "How does AI work?"}
    ],
    "temperature": 0.7,
    "max_tokens": 100
  }
}'

Handling Responses

If successful, the request returns a job_id that you can use to retrieve results.

Checking Job Status

GET https://api.rungen.ai/app/{app_id}/job/{job_id}

Example Response:

{
    "data": {
        "status": "COMPLETED",
        "result": {
            "choices": [
                {"message": {"role": "assistant", "content": "AI is a field of computer science that enables machines to learn and make decisions."}}
            ]
        }
    }
}

4. Next Steps

Experiment with your model in the Playground.
Integrate your model into your application.
Explore advanced configurations in the API documentation.

For any questions, reach out to our support team.

1. Prerequisites​

2. Using the OpenAI Client​

Setting Up the OpenAI Client​

Chat Completions (Streaming & Non-Streaming)​

Streaming Response​

Non-Streaming Response​

Text Completions (Non-Chat Models)​

Streaming Text Completion​

Non-Streaming Text Completion​

3. Using the REST API​

Endpoint​

Headers​

Request Body Example​

Example Request Using cURL​

Handling Responses​

Checking Job Status​

4. Next Steps​