Inference API

c0mpute exposes an OpenAI-compatible HTTP API — built for agents. It speaks chat completions, streaming, tool/function calling, and model discovery, so any agent framework that talks to OpenAI works by changing two things: the base_url and the api_key. Nothing else changes.

base_url:  https://c0mpute.ai/api/v1

Why run your agent on c0mpute: uncensored models (no refusals), decentralized compute, large context, and your prompts are never stored (processed in memory and discarded — only token counts are kept for billing). See Building agents for framework setups.

Authentication

Generate an API key at c0mpute.ai/settings → the API tab. Keys look like sk-c0mpute-… and are shown once on creation. Pass it as a bearer token:

Authorization: Bearer sk-c0mpute-...

Requests are billed to the credit balance of the account that owns the key. Top up with USDC from the dashboard.

Models

Model	Description
`c0mpute-pro`	Uncensored 8B. Fast, runs on the broad worker pool.
`c0mpute-max`	Uncensored 27B with tools, vision, and large context.
`c0mpute-max-think`	`c0mpute-max` with extended chain-of-thought reasoning.

GET /v1/models lists them with a live available flag (Max requires a native GPU worker to be online). Always check availability if you depend on Max.

Pricing

Billing is flat per request — you know the exact cost before you send it, no token math:

Model	Credits / request	USD / request
`c0mpute-pro`	10	$0.10
`c0mpute-max`	15	$0.15
`c0mpute-max-think`	20	$0.20

1 credit = $0.01. Buy credits with USDC from the dashboard. A request that returns a tool call (one step of an agent loop) is one request. Rate limit: 60 requests/minute per key.

Chat completions

POST /v1/chat/completions

curl

curl https://c0mpute.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $C0MPUTE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "c0mpute-pro",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(base_url="https://c0mpute.ai/api/v1", api_key="sk-c0mpute-...")

resp = client.chat.completions.create(
    model="c0mpute-pro",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)

Node (OpenAI SDK)

import OpenAI from "openai";

const client = new OpenAI({ baseURL: "https://c0mpute.ai/api/v1", apiKey: "sk-c0mpute-..." });

const resp = await client.chat.completions.create({
  model: "c0mpute-pro",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(resp.choices[0].message.content);

Streaming

Set stream: true to receive Server-Sent Events as chat.completion.chunk objects, terminated by data: [DONE].

stream = client.chat.completions.create(
    model="c0mpute-pro",
    messages=[{"role": "user", "content": "Write a haiku."}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

Function calling (tools)

Pass your own tools. When the model decides to call one, the response comes back with finish_reason: "tool_calls" and the call(s) under message.tool_calls — you run the tool and send the result back as a tool message. This is what lets agent frameworks drive their own tools on c0mpute.

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    },
}]

messages = [{"role": "user", "content": "What's the weather in Paris?"}]
r1 = client.chat.completions.create(model="c0mpute-max", messages=messages, tools=tools)

call = r1.choices[0].message.tool_calls[0]            # get_weather({"city": "Paris"})
messages.append(r1.choices[0].message)
messages.append({"role": "tool", "tool_call_id": call.id, "content": "18C and sunny"})

r2 = client.chat.completions.create(model="c0mpute-max", messages=messages, tools=tools)
print(r2.choices[0].message.content)                 # "The weather in Paris is 18°C and sunny."

Tool calling and vision are most reliable on c0mpute-max. The Pro 8B can attempt tools but is less consistent.

Vision

c0mpute-max accepts images. Use OpenAI's multimodal content format with an inline base64 data: URL:

import base64
img = base64.b64encode(open("photo.png", "rb").read()).decode()

resp = client.chat.completions.create(
    model="c0mpute-max",
    messages=[{"role": "user", "content": [
        {"type": "text", "text": "What's in this image?"},
        {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{img}"}},
    ]}],
)
print(resp.choices[0].message.content)

Pass images inline as base64; remote https image URLs aren't fetched in this version. Vision requires c0mpute-max.

Building agents

c0mpute is designed to be the brain for agent frameworks. Your framework keeps doing what it does — memory, system prompt / persona, the tool loop — and c0mpute is the model it calls. Memory and persona need zero special handling: they're just the messages array and a system message you already send. Tools work through the standard function-calling flow above (the model returns tool_calls, your framework runs them and sends results back).

For agents, use c0mpute-max (or c0mpute-max-think for harder reasoning) — the 27B is far more reliable at multi-step tool use than the 8B.

Any OpenAI-compatible framework

The universal setup: point the framework's model provider at c0mpute.

base_url / baseURL :  https://c0mpute.ai/api/v1
api_key            :  sk-c0mpute-...
model              :  c0mpute-max

OpenAI Agents SDK (Python)

from agents import Agent, Runner, OpenAIChatCompletionsModel
from openai import AsyncOpenAI

client = AsyncOpenAI(base_url="https://c0mpute.ai/api/v1", api_key="sk-c0mpute-...")
agent = Agent(name="Assistant", instructions="You are helpful.",
              model=OpenAIChatCompletionsModel(model="c0mpute-max", openai_client=client))
print((await Runner.run(agent, "Plan my week.")).final_output)

LangChain / LangGraph (Python)

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="c0mpute-max", base_url="https://c0mpute.ai/api/v1", api_key="sk-c0mpute-...")

Vercel AI SDK (TypeScript)

import { createOpenAI } from "@ai-sdk/openai";
const c0mpute = createOpenAI({ baseURL: "https://c0mpute.ai/api/v1", apiKey: "sk-c0mpute-..." });
// use c0mpute("c0mpute-max") as the model in generateText / streamText / tool loops

Hermes

c0mpute is a custom OpenAI-compatible endpoint, so use the custom provider. In ~/.hermes/config.yaml:

model:
  default: c0mpute-max
  provider: custom
  base_url: https://c0mpute.ai/api/v1

Put your key in the OPENAI_API_KEY environment variable (e.g. in ~/.hermes/.env):

OPENAI_API_KEY=sk-c0mpute-...

Then run, for example, hermes -z "hello" -m c0mpute-max --provider custom.

Errors

Errors are returned in OpenAI's shape ({ "error": { "message", "type", "code" } }):

Status	Meaning
`401`	Missing or invalid API key
`402`	Insufficient credits — top up with USDC
`404`	Unknown model
`429`	Rate limit exceeded
`503`	No worker available for the requested tier (Max needs a native worker online)

Rate limits

Default 60 requests/minute per key. Need more? Reach out.

Authentication​

Models​

Pricing​

Chat completions​

curl​

Python (OpenAI SDK)​

Node (OpenAI SDK)​

Streaming​

Function calling (tools)​

Vision​

Building agents​

Any OpenAI-compatible framework​

Hermes​

Errors​

Rate limits​