Architecture

c0mpute has three components: the user client, the orchestrator, and workers.

User client

The web interface at c0mpute.ai. Users authenticate via Privy, select a tier, and send messages. The client connects to the orchestrator via Socket.io and receives streamed token responses in real-time.

Orchestrator

The central routing layer. A Node.js server using Socket.io that coordinates everything:

Authentication — validates user sessions and worker tokens via Privy
Job queue — receives user requests, queues them by tier, and matches them to available workers
Worker registry — tracks all connected workers: type (browser/native), model, status (idle/busy), performance stats
Tier routing — directs jobs to the correct worker type:
- Free → browser workers running Qwen 1.5B
- Pro → browser workers running Dolphin Mistral 7B
- Max → native workers running Qwen2.5 14B
Search — for Max tier, runs Brave Search API queries, fetches and extracts content from the top 3 results, and injects summarized context into the prompt
Stats broadcast — pushes real-time network stats (active workers, queue depth, jobs completed) to all connected clients every 5 seconds

The orchestrator does not store conversations or prompt content. It routes traffic and discards it.

Workers

Browser workers (WebGPU)

Run in a browser tab using WebLLM, which leverages WebGPU for GPU-accelerated inference. Two models available:

Qwen 1.5B (~900MB) — serves Free tier
Dolphin Mistral 7B (~4GB) — serves Pro tier

Models download once and cache in the browser. Workers connect to the orchestrator via Socket.io, receive job assignments, run inference, and stream tokens back.

Native workers (node-llama-cpp)

Run as a Node.js process using node-llama-cpp for inference with hardware acceleration:

CUDA — NVIDIA GPUs
Metal — Apple Silicon
Vulkan — AMD and Intel GPUs

Native workers exclusively run Qwen2.5 14B abliterated and serve Max tier requests. They authenticate with a worker token and connect to the orchestrator via Socket.io.

Job lifecycle

User sends message
Orchestrator receives request, determines tier
Request enters tier-specific queue
Orchestrator matches request to an idle worker of the correct type
Job assigned to worker
Worker runs inference, streams tokens back to orchestrator
Orchestrator relays tokens to user in real-time
Job completes, worker marked idle, earnings credited

Search flow (Max tier)

User sends message (Max tier)
Orchestrator extracts search query from the message
Brave Search API returns top results
Orchestrator fetches top 3 page URLs and extracts content
Summarized search context injected into the prompt
Enriched prompt sent to native worker
Worker generates response grounded in web content
Response streams back with source citations

Stats

The orchestrator broadcasts network stats to all connected clients every 5 seconds:

Number of active workers (by type and model)
Current queue depth per tier
Total jobs completed
Network-wide tokens per second

User client​

Orchestrator​

Workers​

Browser workers (WebGPU)​

Native workers (node-llama-cpp)​

Job lifecycle​

Search flow (Max tier)​

Stats​