PharosVPN
§08 · docs

draft v2 · 2026-05-17 · source on github ↗ · synced 2026-05-18

Platform Design

PharosVPN — Platform Design

Status: draft v2 · 2026-05-17 Supersedes: amnezia-travelvpn/NEW-PROJECT-DESIGN.md (the v1 design notes).

This is the single source of truth for the PharosVPN platform. Every subproject README and BUILD.md defers to this document. When code and this document disagree, the document is wrong — fix it in the same PR.


0. What PharosVPN is

A self-hostable, open-source, dual-protocol VPN fleet platform. One codebase serves two postures from the same binaries:

Defaults differ; the engine is identical.

The data plane is AmneziaWG (obfuscated WireGuard) and XRay (VLESS + REALITY), both terminating end-user tunnels on UDP/TCP 443. The platform is the control plane, account system, and clients around that data plane.


1. Goals


2. The three node roles + clients

                  ┌──────────────────────────────────┐
                  │  helm  — CONTROLLER               │
                  │  (private network, behind NAT)    │
                  │  vpn-mgr daemon + admin Web UI     │
                  │  + SQLite state + CA + embedded    │
                  │    beacon relay (toggleable)       │
                  └───────┬──────────────────┬─────────┘
        mTLS, helm-initiated outbound        │ reverse tunnel
        gRPC/HTTP2 to each node              │ (helm dials OUT to a
                  │                          │  remote beacon)
        ┌─────────┼─────────┐                ▼
        ▼         ▼         ▼          ┌──────────────┐
   ┌────────┐┌────────┐┌────────┐      │ beacon RELAY │  (public)
   │  buoy  ││  buoy  ││  buoy  │ ...  │ mTLS ingress │
   │ NODE A ││ NODE B ││ NODE N │      │ for clients  │
   │ public ││ public ││ public │      └──────┬───────┘
   │ AWG udp ││  ...   ││  ...   │             │ mTLS
   │ XRay tcp││        ││        │             ▼
   └────┬───┘└────┬───┘└────┬───┘      ┌──────────────┐
        │ udp/tcp 443       │          │   caravel    │
        ▼        end users  ▼          │ mobile client│
     end-user tunnels                  └──────────────┘
RoleRepoNetwork postureJob
ControllerhelmPrivate, behind NAT. Zero inbound ports.Source of truth, admin UI, issues certs/profiles, drives the fleet.
VPN nodebuoyPublic IP. Listens udp/tcp 443 + mTLS control port.Runs the data plane. Dumb agent — applies only validated config.
RelaybeaconPublic. The only public ingress for clients.mTLS-terminating proxy. Lets clients reach a NAT’d controller. Always embedded in helm; optionally deployed remote.
Mobile clientcaravelEnd-user device.Runs the actual VPN tunnel + acquires profiles from multiple sources.

Key inversion: the controller dials out to everything. Buoys are already public (they must be, to terminate tunnels), so helm initiates outbound mTLS to each buoy. helm also dials out to a remote beacon (reverse tunnel), so the controller needs zero inbound ports anywhere.

beacon embedded vs remote. A beacon relay always runs in-process inside helm (toggleable off in the admin UI). When the controller sits behind NAT and must serve clients, deploy a remote beacon on a public host (its own VM, or co-located on a buoy); helm dials out to it over a persistent reverse tunnel. Embedded and remote are transport differences only — identical trust.


3. Component responsibilities

helm (controller)

buoy (VPN node agent)

beacon (relay)

caravel (mobile client)


4. Trust model & PKI

A single in-repo root CA, generated on helm’s first run, stored in helm’s SQLite, never copied off the controller. Two intermediates under it:

CertificateIssued byHeld byValidity
Root CAself-signedhelm only10 years
Fleet / Device intermediatesRoot CAhelm only5 years
Controller client certFleet CAhelm1 year, auto-rotated
Node server certFleet CAeach buoy1 year, auto-rotated by push
Relay certFleet CAeach beacon1 year, auto-rotated
Device leafDevice CAeach caravel / browser1 year
helm SSH keyhelm (self)helmlong-lived, for agent deploy

Compromise containment:


5. Bootstrap & enrollment

Node enrollmenthelm nodes add <ssh-host>:

  1. The operator creates a VM on any provider and adds helm’s SSH public key (printed by helm ssh-key) to its authorized_keys. helm has its own SSH keypair, generated on first run and stored in SQLite.
  2. helm connects out over SSH, pins the host key on first use (TOFU), and installs the buoy agent — either by uploading a bundled binary or running a one-line download.
  3. buoy generates its own keypair on the node and emits a CSR. helm pulls the CSR back over SSH, signs it with the Fleet CA, and pushes the certificate plus the CA back. The node’s private key never leaves the node and helm never holds it.
  4. helm starts the buoy service. From here every instruction is gRPC.

SSH is a deployment channel only — install and update of the agent. There is no enrollment-mode listener and no one-time bootstrap token; the trusted SSH channel replaces both.

Relay enrollment — same SSH pattern for a remote beacon.

User / device enrollment — a user is given an enrollment ticket (QR or deep link, see §9). caravel scans it, contacts beaconhelm, the device generates a keypair, helm issues a Device-CA leaf, and the device is bound to the user account. The enrollment ticket is the only moment of weakness: short TTL, one-use, scoped.


6. Wire protocol

gRPC over mTLS (HTTP/2, TLS 1.3). Decided — not plain JSON. The live event streaming in §7 needs server-streaming, which is native to gRPC and awkward otherwise. Both Go ends make codegen cheap; caravel consumes generated clients too.


7. Real-time & multi-admin

The admin UI must feel live — a client connecting to a node appears immediately, not on a 30-second poll.

Optimistic concurrency. Every mutable record carries a version integer. A mutation must send the version the admin loaded. If helm’s current version is higher, it rejects with HTTP 409 Conflict — “changed by someone else, reload.” Admin A editing a stale copy of a user is refused because Admin B already bumped it. Live WebSocket replication makes conflicts rare (A’s screen usually updates before A saves); the version check is the hard safety net.


8. Accounts, profiles & sync

Accounts and roles

users are authentication principals (unlike the old codebase, where only admins logged in). On login the role decides the surface: user → own profiles only; admin → also the admin console (web = full; caravel = a small glance-and-quick-actions subset).

Profile sources (the unified-client model)

caravel has a VPN engine that only ever reads a local profile store. Profiles enter that store from interchangeable sources — “synced vs unsynced” is just which sources are enabled, not two apps:

SourceAudienceMechanism
Account syncPersonallogin → beaconhelm → pull, E2E-decrypt on device
QR scanAnyonescan an enrollment ticket or a self-contained profile QR
File importAnyoneopen a .pharos file (Mail/Files/AirDrop/portal)
MDM managed configEnterpriseMDM pushes profiles + policy into managed config
Deep linkPortal-drivenpharosvpn://import?...

The account/sync service + beacon are an optional platform component: an enterprise doing only MDM/QR runs no beacon and no account service.

End-to-end profile encryption

Each user has a long-lived keypair. helm holds only the public key. helm generates a profile, encrypts it to the user’s public key, stores ciphertext, discards plaintext. Only the user’s devices decrypt. Hybrid envelope:

Private-key storage — DECIDED: passphrase-wrapped blob held by helm. The user’s private key is encrypted with a key derived (Argon2id) from the user’s passphrase; helm stores only that opaque blob and never the passphrase or a usable private key. Any new device unwraps it with the passphrase. This gives seamless multi-device + recovery; a helm compromise yields only a passphrase-encrypted blob. (Device-to-device-only transfer was the alternative — stronger, but no recovery and clunkier enrollment; rejected for v1.)


9. The .pharos file format

One extension: .pharos. Encryption is a property inside the file, not a separate extension (no .spharos). The app reads the header and auto-routes.

A .pharos file is a JSON container with an always-readable header:

{
  "fmt": "pharos-profile",
  "v": 1,
  "enc": "none" | "password" | "account",
  // password mode: "kdf": {argon2id params + salt}, "cipher", "nonce"
  // account mode:  "recipient", "wrapped_key", "cipher", "nonce"
  "payload": <profile object>  |  "<base64 AEAD ciphertext>"
}
encMeaningClient handling
nonePlaintextLoad directly
passwordArgon2id → XChaCha20-Poly1305Prompt for password, decrypt
accountPer-user hybrid envelope (§8)Decrypt silently with device key

The header is fed as AAD so enc/v/KDF params are authenticated (no downgrade). Content-sniff on fmt so renamed files still import. Register MIME application/vnd.pharosvpn.profile, an iOS UTI, an Android intent filter.

Future-proof protocol model. The profile carries nodes, each with a versioned, tagged list of protocols:

{ "nodes": [ { "id": "...", "endpoints": ["..."], "protocols": [
    { "type": "amneziawg",    "v": 2, "params": { } },
    { "type": "xray-reality", "v": 1, "params": { } }
] } ] }

Clients keep a registry of handlers keyed by type. Ignore-unknown, never reject: an old client skips a protocol/node it can’t handle. Adding a protocol = a new type string + a handler, zero format break. Plus metadata: user, device, fleet_id, issued_at, expires_at, revision.

QR codes

Compression does not solve QR size — profiles are mostly incompressible key material. A reliably-scannable QR holds ~150–300 bytes; a multi-node profile is kilobytes. Two QR kinds:

A separate Amnezia-compatible .vpn export remains for users on the Amnezia client. .pharos is our format and is not asked to do that job.


10. Persistence

Single SQLite database on helm (state/app.db), Goose migrations.

Tables: ca (the root + intermediate CAs, §4), nodes, profiles, users, devices, peers, admins, sessions, node_certs, device_certs, bootstrap_tokens, audit_log, metrics_samples, relays. Every mutable row carries version INTEGER and updated_at for §7 optimistic concurrency. YAML projections under state/snapshots/ continue for git-friendly diffs. buoy and beacon have no database.


11. Failure modes

FailureBehaviour
helm crashesAll buoys keep serving tunnels. No new peers until back.
helm ↔ buoy unreachableBuoy keeps serving. helm marks unreachable after 3 missed polls, alerts, retries with backoff.
buoy crashesIts tunnels drop. Clients fail over to other nodes in the profile.
buoy compromisedAttacker has that node’s keys, not the CA key. Operator revokes the cert.
Remote beacon compromisedTraffic metadata exposed; profile bundles are ciphertext. No cert minting.
helm compromisedWorst case. CA key lost → rotate CA, mass re-enroll. User profiles stay encrypted.
Account service / beacon downcaravel connects from cached local profiles.

12. Defaults: personal vs enterprise

Same binaries, two presets at helm init:

--personal--enterprise
Regions1, nearestoperator picks
Idle nodesnoneencouraged (pre-positioned, stopped)
ProtocolsAmneziaWG default, XRay optionalboth
beaconembeddedembedded + remote relays
Account synconoptional (MDM-only deployments run none)
Adminsone (the operator)core admin + UI-added others
Audit retention30 days1 year
Metrics retention7 days90 days
REALITY decoy sitewww.microsoft.comconfigurable, rotated

13. License & contribution


14. Repo map

RepoWhatStackOwner
docsThis document, BUILD.md, protobuf contractsMarkdown / protocore
helmController / management plane + admin UIGo + SvelteKitcore (you + Claude)
buoyVPN node agentGosubagent
beaconRelayGosubagent
caravelMobile clientnative (Kotlin / Swift)subagent
.githubOrg profileMarkdowncore

buoy and beacon lift and rebrand the reverse-tunnel, transparent-proxy, and device-CA machinery from the private sultix project (same owner). All sultix / mcproxy / mctunnel / x-sultix-* identifiers must be stripped.


15. Decisions log

#DecisionDate
1Name: PharosVPN. Org github.com/PharosVPN.2026-05-17
2License AGPL-3.0-or-later + DCO, no CLA.2026-05-17
3Wire protocol: gRPC over mTLS (not plain JSON).2026-05-17
4Three roles: helm / buoy / beacon; client caravel.2026-05-17
5beacon always embedded in helm, optionally remote.2026-05-17
6Live UI: buoy→helm event stream + helm→browser WebSocket.2026-05-17
7Optimistic concurrency: per-row version, 409 on stale write.2026-05-17
8Per-user E2E profile encryption; hybrid envelope.2026-05-17
9Private key: passphrase-wrapped blob on helm (Argon2id).2026-05-17
10File format: single .pharos extension, enc in-header.2026-05-17
11Protocols: versioned tagged list, ignore-unknown.2026-05-17
12QR: enrollment ticket default; self-contained QR for offline.2026-05-17
13Reuse + rebrand sultix relay/tunnel/device-CA code.2026-05-17
14Node/relay onboarding over SSH (agent install + update); no cloud-provider API. Node keys are generated on-node and signed via CSR; no bootstrap token. Supersedes the §3 CloudProvider interface.2026-05-18

Still open