How it works — Open RAM

Guide

How Open RAM works

Rent RAM / GPU / CPU machines, run open-source models on them, and pay by the hour. Here's how the pieces fit together.

Start renting

Two layers, one platform

A compute marketplace underneath, an open-source AI toolkit on top.

Compute layer

Providers rent out spare machines. You rent the capacity you need.

• Rent CPU / GPU / RAM machines by the hour (16 listed, 11 with GPUs).

• Launch a personal cloud AI workstation with tools pre-installed.

• Submit heavy browser-based compute jobs.

Open-source AI layer

Run open models — Llama, Qwen, Mistral, DeepSeek, SDXL, Whisper.

• Deploy any of 12 models as an API endpoint.

• Benchmark them on real hardware profiles.

• Access everything through one API router.

How they connect

Every action is matched to a machine with enough RAM / GPU / CPU.

You send a request

A prompt, job, benchmark, or deploy.

Router picks compute

Matches RAM / GPU / CPU to the workload + your strategy.

Runs on a rented machine

A marketplace machine with enough resources.

Open model executes

Llama, Qwen, Mistral, SDXL, Whisper…

Result + cost back

Output, latency, RAM/GPU used, price.

When you deploy a model, run a benchmark, use the router, or submit a job, Open RAM's compute-selection engine reads the workload's requirements and your strategy, then picks the best machine — and tells you exactly why.

It runs on your fleet first. If you've rented a machine that fits, your workload runs there — so you get value from the capacity you're already paying for. Only when nothing you've rented fits does the platform auto-provision a machine on-demand from the marketplace.

How renting works

No commitment — reserve a machine, use it, stop it.

1 · Pick a machine

Browse the marketplace, filter by RAM / GPU / price / region, and check the provider trust score.

2 · Rent it

One click reserves the machine. It moves from Provisioning → Active in seconds and appears on your dashboard.

3 · Use it in the Workspace

Open the Workspace and connect: run a Jupyter notebook in your browser, or copy the SSH command to use it from your own terminal. Billed per hour while active.

4 · Stop anytime

Stop the machine to halt billing, or terminate to remove it. No long-term commitment.

How do I actually use a machine I rented?

Your machine is a real computer in the cloud — you connect to it from the Workspace.

In your browser (Jupyter)

Open the Workspace, find your machine, and click Use it here.

A Jupyter notebook opens right in the page — write Python and run it on the machine's GPU. Nothing to install.

From your own computer (SSH)

Each machine shows an SSH command in the Workspace.

Copy it into your terminal (Windows Terminal, macOS Terminal, Linux) to log in and run anything, upload files, and more.

Only live RunPod machines get Jupyter + SSH. You're billed hourly while it runs — click Stop in the Workspace when you're done.

Which machine runs my workload?

The platform matches each workload to hardware that fits.

Workload	Matched hardware	Why
Small chat model	Cheap CPU + RAM machine	Fits in 16–32GB RAM, no GPU needed.
Huge Llama / Qwen (70B+)	High-RAM or datacenter GPU	Needs 80GB+ VRAM or 160GB+ RAM.
Stable Diffusion	GPU machine (≥16GB VRAM)	Image generation is GPU-bound.
Whisper transcription	CPU or GPU machine	Runs on CPU; faster on a GPU.
Long document prompt	High-RAM machine	Large context needs more memory.
Private business workload	Dedicated, high-trust machine	Isolated, single-tenant hardware.

Three ways to run a model

Deploy a dedicated endpoint, let the router choose, or benchmark first.

Deploy a dedicated endpoint

Pick a model, deploy it, and get a private OpenAI-compatible URL + API key. The platform picks a machine that fits the model.

Deploy a model

Use the unified API router

Send a prompt with a strategy (cheapest, fastest, quality, long-context, private). The router picks the model + machine for you.

Open the router

Benchmark, then commit

Compare models across hardware on quality, latency, cost and tokens/sec — then deploy the best setup.

Run a benchmark

What do I do with the API?

One endpoint, OpenAI-compatible. Send a prompt + a strategy; we route it.

The router is a single URL. You pass a mode (cheapest / fastest / quality / long-context / private) and your messages — it chooses the model and machine.

A deployed model gives you its own URL + API key, pinned to one model on a machine that fits it. Drop the URL into any OpenAI SDK.

Authenticate with a Bearer API key. Private mode keeps the workload on dedicated hardware.

This is a prototype — the endpoints above are illustrative, but the routing decisions and pricing are computed live.

curl https://api.opencompute.ai/v1/route \
  -H "Authorization: Bearer oca_live_••••••••" \
  -H "Content-Type: application/json" \
  -d '{
    "mode": "quality",
    "messages": [{ "role": "user", "content": "Summarize the key risks in this vendor contract." }]
  }'

# mode "quality" = Highest Quality. The router chooses the model
# and machine, runs it on rented compute, and returns the completion plus the
# routing decision (chosen model, machine, cost, latency).

Ready to try it?

Rent a machine, deploy a model, and route your first request.

Explore marketplace Open dashboard