Open-source models
Models
Deploy any open-source model to a managed API endpoint. The platform selects compute with enough RAM and GPU to host it, then exposes a live, authenticated API.
Deploy a model as an API
Pick any open-source model and the platform rents a machine that fits its RAM and GPU needs, loads the model, and hands you an OpenAI-compatible endpoint plus an API key — ready to call from curl, Python, or JavaScript.
How it works →Model catalog
Filter by family or task, then deploy.
Connect a wallet to deploy APIs
Deploying bills ~1 hour of compute upfront. Pay in SOL.
Connect a Solana wallet
Pay for compute in SOL. Powered by Privy — connect Phantom (or email), then buy credits.
Real Privy + Solana · Phantom supported
Llama 3.1 8B Instruct
Fast, capable general-purpose chat model. Great default for assistants and tool use.
Rec. RAM
16GB
Rec. GPU
None
Est. cost
$0.1800/hr
Latency
420ms
Context
128K tokens
License
Llama 3.1 Community
Llama 3.1 70B Instruct
High-quality reasoning and writing. Needs high RAM or a datacenter GPU.
Rec. RAM
160GB
Rec. GPU
A100
Est. cost
$2.45/hr
Latency
780ms
Context
128K tokens
License
Llama 3.1 Community
Mistral 7B Instruct
Efficient open model with strong instruction following at low cost.
Rec. RAM
16GB
Rec. GPU
None
Est. cost
$0.1600/hr
Latency
390ms
Context
32K tokens
License
Apache 2.0
Mixtral 8x7B
Sparse mixture-of-experts model balancing quality and throughput.
Rec. RAM
96GB
Rec. GPU
A100
Est. cost
$1.90/hr
Latency
640ms
Context
32K tokens
License
Apache 2.0
Qwen 2.5 72B Instruct
Top-tier multilingual reasoning and coding. Datacenter-class hardware recommended.
Rec. RAM
168GB
Rec. GPU
H100
Est. cost
$3.90/hr
Latency
720ms
Context
131K tokens
License
Qwen License
Qwen 2.5 7B Instruct
Compact multilingual model strong at coding and math.
Rec. RAM
16GB
Rec. GPU
None
Est. cost
$0.1700/hr
Latency
400ms
Context
131K tokens
License
Apache 2.0
Qwen 2.5 32B (Long Context)
Tuned for very long documents and retrieval over large prompts.
Rec. RAM
128GB
Rec. GPU
A100
Est. cost
$1.70/hr
Latency
900ms
Context
262K tokens
License
Qwen License
DeepSeek-R1 Distill 32B
Reasoning-optimized distilled model with strong chain-of-thought.
Rec. RAM
96GB
Rec. GPU
A100
Est. cost
$1.75/hr
Latency
1.10s
Context
66K tokens
License
MIT
DeepSeek Coder 7B
Code-specialized model for completion and refactoring.
Rec. RAM
16GB
Rec. GPU
None
Est. cost
$0.1800/hr
Latency
410ms
Context
16K tokens
License
DeepSeek License
Stable Diffusion XL
High-resolution text-to-image generation. Requires a GPU with ≥16GB VRAM.
Rec. RAM
32GB
Rec. GPU
RTX 4090
Est. cost
$1.05/hr
Latency
3.20s
Context
—
License
OpenRAIL++
Stable Diffusion 3.5 Large
Latest high-fidelity diffusion model with improved prompt adherence.
Rec. RAM
48GB
Rec. GPU
A100
Est. cost
$2.45/hr
Latency
4.10s
Context
—
License
Stability Community
Whisper Large v3
State-of-the-art speech-to-text across many languages. Runs on CPU or GPU.
Rec. RAM
16GB
Rec. GPU
L4
Est. cost
$0.7400/hr
Latency
1.80s
Context
—
License
MIT
Deployed models
Live endpoints you can call right now.