Open RAM — Run open-source AI on rented compute

API Router

One API for many open-source models

Send a prompt and a strategy. We pick the best open-source model and the right machine to run it on — then return a unified, OpenAI-compatible response.

Unified endpoint

One API, the right model every time

Send a prompt plus a strategy — cheapest, fastest, quality, long-context, or private — and the router picks the best open-source model and the machine to run it on, then tells you exactly why it chose them. One unified, OpenAI-compatible endpoint instead of juggling many models and providers.

How it works →

Routing strategy

Choose how the router optimizes each request.

Prompt

What should the model do?

Quick fill

Optimizing for Cheapest

Example API request

Call the unified endpoint with your chosen mode. Updates as you switch strategies.

curl https://api.opencompute.ai/v1/route \
  -H "Authorization: Bearer oca_live_••••••••" \
  -H "Content-Type: application/json" \
  -d '{
    "mode": "cheapest",
    "messages": [{ "role": "user", "content": "Summarize the key risks in this vendor contract." }]
  }'

# mode "cheapest" = Cheapest. The router chooses the model
# and machine, runs it on rented compute, and returns the completion plus the
# routing decision (chosen model, machine, cost, latency).

Routing decision

Why the router chose this model and machine.

No decision yet

Pick a strategy, enter a prompt, and route a request to see the chosen model, machine, and the reasoning behind it.

How a routed request flows

From your request to the model running on rented compute — and the result back.

You send a request

A prompt, job, benchmark, or deploy.

Router picks compute

Matches RAM / GPU / CPU to the workload + your strategy.

Runs on a rented machine

A marketplace machine with enough resources.

Open model executes

Llama, Qwen, Mistral, SDXL, Whisper…

Result + cost back

Output, latency, RAM/GPU used, price.

Recent decisions

Every routed request, newest first.