How c0mpute works

The flow

User → Orchestrator → Worker → tokens stream back → User

You send a message from the c0mpute.ai chat interface
The orchestrator receives your request and finds a matching worker based on your selected tier
The worker runs inference on their GPU and streams tokens back through the orchestrator
You see the response appear in real-time, word by word

The orchestrator

The orchestrator is a Node.js server using Socket.io for real-time communication. It handles:

Authentication — verifying users and workers via Privy
Job queue — managing incoming requests and matching them to available workers
Worker registry — tracking which workers are online, their capabilities, and current load
Routing — directing jobs to the right worker type based on the selected tier
Search — running web searches for Max tier requests and injecting context
Stats — broadcasting real-time network statistics every 5 seconds

The orchestrator does not store conversations. It routes traffic and moves on.

Browser workers

Browser workers run LLMs directly in your browser tab using WebGPU through the WebLLM library. No installation required — just open the page and click start.

Two models are available for browser workers:

Qwen 1.5B — serves Free tier requests. ~900MB download, runs on most modern GPUs.
Dolphin Mistral 7B — serves Pro tier requests. ~4GB VRAM required, uncensored.

The model downloads once and caches in the browser. Subsequent starts are instant.

Native workers

Native workers run on your machine using node-llama-cpp, which supports CUDA (NVIDIA), Metal (Apple Silicon), and Vulkan (AMD/Intel) acceleration.

Native workers run Qwen2.5 14B abliterated and serve Max tier requests exclusively. They require a GPU with 10GB+ VRAM and deliver 30-100+ tokens per second depending on hardware.

Job routing

Tier	Worker type	Model
Free	Browser (WebGPU)	Qwen 1.5B
Pro	Browser (WebGPU)	Dolphin Mistral 7B
Max	Native (node-llama-cpp)	Qwen2.5 14B abliterated

Web search (Max tier only)

When a Max tier user sends a message, the orchestrator can run a web search using the Brave Search API. It takes the top results, fetches their page content, and injects a summarized context into the prompt before sending it to the worker. The model then responds with information grounded in real, up-to-date web content and cites its sources.

Token streaming

Responses stream in real-time. As the worker generates each token, it's sent through the orchestrator to the user immediately. There's no waiting for the full response — you see it being written live, just like any other chat AI, except the compute is happening on someone's GPU across the network.

The flow​

The orchestrator​

Browser workers​

Native workers​

Job routing​

Web search (Max tier only)​

Token streaming​