Models

Open-source models

Models

Deploy any open-source model to a managed API endpoint. The platform selects compute with enough RAM and GPU to host it, then exposes a live, authenticated API.

12 models

Deploy a model as an API

Pick any open-source model and the platform rents a machine that fits its RAM and GPU needs, loads the model, and hands you an OpenAI-compatible endpoint plus an API key — ready to call from curl, Python, or JavaScript.

How it works

Model catalog

Filter by family or task, then deploy.

Family
Task

Connect a wallet to deploy APIs

Deploying bills ~1 hour of compute upfront. Pay in SOL.

Connect a Solana wallet

Pay for compute in SOL. Powered by Privy — connect Phantom (or email), then buy credits.

Real Privy + Solana · Phantom supported

Llama 3.1 8B Instruct

LlamaChat8B params

Fast, capable general-purpose chat model. Great default for assistants and tool use.

Rec. RAM

16GB

Rec. GPU

None

Est. cost

$0.1800/hr

Latency

420ms

Context

128K tokens

License

Llama 3.1 Community

Quality score78/100

Llama 3.1 70B Instruct

LlamaLarge LLM70B params

High-quality reasoning and writing. Needs high RAM or a datacenter GPU.

Rec. RAM

160GB

Rec. GPU

A100

Est. cost

$2.45/hr

Latency

780ms

Context

128K tokens

License

Llama 3.1 Community

Quality score91/100

Mistral 7B Instruct

MistralChat7B params

Efficient open model with strong instruction following at low cost.

Rec. RAM

16GB

Rec. GPU

None

Est. cost

$0.1600/hr

Latency

390ms

Context

32K tokens

License

Apache 2.0

Quality score75/100

Mixtral 8x7B

MistralLarge LLM47B params

Sparse mixture-of-experts model balancing quality and throughput.

Rec. RAM

96GB

Rec. GPU

A100

Est. cost

$1.90/hr

Latency

640ms

Context

32K tokens

License

Apache 2.0

Quality score85/100

Qwen 2.5 72B Instruct

QwenLarge LLM72B params

Top-tier multilingual reasoning and coding. Datacenter-class hardware recommended.

Rec. RAM

168GB

Rec. GPU

H100

Est. cost

$3.90/hr

Latency

720ms

Context

131K tokens

License

Qwen License

Quality score93/100

Qwen 2.5 7B Instruct

QwenChat7B params

Compact multilingual model strong at coding and math.

Rec. RAM

16GB

Rec. GPU

None

Est. cost

$0.1700/hr

Latency

400ms

Context

131K tokens

License

Apache 2.0

Quality score80/100

Qwen 2.5 32B (Long Context)

QwenLong Context32B params

Tuned for very long documents and retrieval over large prompts.

Rec. RAM

128GB

Rec. GPU

A100

Est. cost

$1.70/hr

Latency

900ms

Context

262K tokens

License

Qwen License

Quality score88/100

DeepSeek-R1 Distill 32B

DeepSeekLarge LLM32B params

Reasoning-optimized distilled model with strong chain-of-thought.

Rec. RAM

96GB

Rec. GPU

A100

Est. cost

$1.75/hr

Latency

1.10s

Context

66K tokens

License

MIT

Quality score90/100

DeepSeek Coder 7B

DeepSeekChat7B params

Code-specialized model for completion and refactoring.

Rec. RAM

16GB

Rec. GPU

None

Est. cost

$0.1800/hr

Latency

410ms

Context

16K tokens

License

DeepSeek License

Quality score79/100

Stable Diffusion XL

Stable DiffusionImage Generation3.5B params

High-resolution text-to-image generation. Requires a GPU with ≥16GB VRAM.

Rec. RAM

32GB

Rec. GPU

RTX 4090

Est. cost

$1.05/hr

Latency

3.20s

Context

License

OpenRAIL++

Quality score86/100

Stable Diffusion 3.5 Large

Stable DiffusionImage Generation8B params

Latest high-fidelity diffusion model with improved prompt adherence.

Rec. RAM

48GB

Rec. GPU

A100

Est. cost

$2.45/hr

Latency

4.10s

Context

License

Stability Community

Quality score90/100

Whisper Large v3

WhisperTranscription1.55B params

State-of-the-art speech-to-text across many languages. Runs on CPU or GPU.

Rec. RAM

16GB

Rec. GPU

L4

Est. cost

$0.7400/hr

Latency

1.80s

Context

License

MIT

Quality score89/100

Deployed models

Live endpoints you can call right now.