Guide
How Open RAM works
Rent RAM / GPU / CPU machines, run open-source models on them, and pay by the hour. Here's how the pieces fit together.
Two layers, one platform
A compute marketplace underneath, an open-source AI toolkit on top.
Compute layer
Providers rent out spare machines. You rent the capacity you need.
• Rent CPU / GPU / RAM machines by the hour (16 listed, 11 with GPUs).
• Launch a personal cloud AI workstation with tools pre-installed.
• Submit heavy browser-based compute jobs.
Open-source AI layer
Run open models — Llama, Qwen, Mistral, DeepSeek, SDXL, Whisper.
• Deploy any of 12 models as an API endpoint.
• Benchmark them on real hardware profiles.
• Access everything through one API router.
How they connect
Every action is matched to a machine with enough RAM / GPU / CPU.
You send a request
A prompt, job, benchmark, or deploy.
Router picks compute
Matches RAM / GPU / CPU to the workload + your strategy.
Runs on a rented machine
A marketplace machine with enough resources.
Open model executes
Llama, Qwen, Mistral, SDXL, Whisper…
Result + cost back
Output, latency, RAM/GPU used, price.
When you deploy a model, run a benchmark, use the router, or submit a job, Open RAM's compute-selection engine reads the workload's requirements and your strategy, then picks the best machine — and tells you exactly why.
It runs on your fleet first. If you've rented a machine that fits, your workload runs there — so you get value from the capacity you're already paying for. Only when nothing you've rented fits does the platform auto-provision a machine on-demand from the marketplace.
How renting works
No commitment — reserve a machine, use it, stop it.
1 · Pick a machine
Browse the marketplace, filter by RAM / GPU / price / region, and check the provider trust score.
2 · Rent it
One click reserves the machine. It moves from Provisioning → Active in seconds and appears on your dashboard.
3 · Use it in the Workspace
Open the Workspace and connect: run a Jupyter notebook in your browser, or copy the SSH command to use it from your own terminal. Billed per hour while active.
4 · Stop anytime
Stop the machine to halt billing, or terminate to remove it. No long-term commitment.
How do I actually use a machine I rented?
Your machine is a real computer in the cloud — you connect to it from the Workspace.
In your browser (Jupyter)
Open the Workspace, find your machine, and click Use it here.
A Jupyter notebook opens right in the page — write Python and run it on the machine's GPU. Nothing to install.
From your own computer (SSH)
Each machine shows an SSH command in the Workspace.
Copy it into your terminal (Windows Terminal, macOS Terminal, Linux) to log in and run anything, upload files, and more.
Only live RunPod machines get Jupyter + SSH. You're billed hourly while it runs — click Stop in the Workspace when you're done.
Which machine runs my workload?
The platform matches each workload to hardware that fits.
| Workload | Matched hardware | Why |
|---|---|---|
| Small chat model | Cheap CPU + RAM machine | Fits in 16–32GB RAM, no GPU needed. |
| Huge Llama / Qwen (70B+) | High-RAM or datacenter GPU | Needs 80GB+ VRAM or 160GB+ RAM. |
| Stable Diffusion | GPU machine (≥16GB VRAM) | Image generation is GPU-bound. |
| Whisper transcription | CPU or GPU machine | Runs on CPU; faster on a GPU. |
| Long document prompt | High-RAM machine | Large context needs more memory. |
| Private business workload | Dedicated, high-trust machine | Isolated, single-tenant hardware. |
Three ways to run a model
Deploy a dedicated endpoint, let the router choose, or benchmark first.
Deploy a dedicated endpoint
Pick a model, deploy it, and get a private OpenAI-compatible URL + API key. The platform picks a machine that fits the model.
Deploy a modelUse the unified API router
Send a prompt with a strategy (cheapest, fastest, quality, long-context, private). The router picks the model + machine for you.
Open the routerBenchmark, then commit
Compare models across hardware on quality, latency, cost and tokens/sec — then deploy the best setup.
Run a benchmarkWhat do I do with the API?
One endpoint, OpenAI-compatible. Send a prompt + a strategy; we route it.
The router is a single URL. You pass a mode (cheapest / fastest / quality / long-context / private) and your messages — it chooses the model and machine.
A deployed model gives you its own URL + API key, pinned to one model on a machine that fits it. Drop the URL into any OpenAI SDK.
Authenticate with a Bearer API key. Private mode keeps the workload on dedicated hardware.
This is a prototype — the endpoints above are illustrative, but the routing decisions and pricing are computed live.
curl https://api.opencompute.ai/v1/route \
-H "Authorization: Bearer oca_live_••••••••" \
-H "Content-Type: application/json" \
-d '{
"mode": "quality",
"messages": [{ "role": "user", "content": "Summarize the key risks in this vendor contract." }]
}'
# mode "quality" = Highest Quality. The router chooses the model
# and machine, runs it on rented compute, and returns the completion plus the
# routing decision (chosen model, machine, cost, latency).Ready to try it?
Rent a machine, deploy a model, and route your first request.