metal-kompanion/docs/design-local-first-architec...

3.7 KiB

Design Decision: Local-First Personal Store with Optional Federated Services

Decision: Kompanion adopts a two-tier architecture. A personal, local store (Akonadi-like) is the authoritative home of a user's data and operates fully offline. An optional federated layer provides encrypted backups, multi-device sync, and paid cloud conveniences (e.g., hosted search/rerank). Users can run purely local, or selectively enable cloud features.

Encryption Note: We deliberately leave the exact cryptography suite open to allow hardware/OS keychains, libsodium, AES-GCM, or XChaCha20-Poly1305. The guardrails below assume end-to-end encryption (E2EE) with keys controlled by the user.


1) Personal Store (Local Core) — kom.local.v1

  • Runs entirely on-device; no network required.
  • DB: SQLite (+ FTS/trigram for "rgrep" feel) + FAISS for vectors.
  • Embeddings/Reranker: local (Ollama + optional local reranker).
  • Privacy defaults: do-not-embed secrets; private-vault items are never vectorized/FTS'd; E2EE for backups/exports.
  • Backup tools: backup.export_encrypted, backup.import_encrypted (E2EE blobs).

2) Federated Services (Optional) — kom.cloud.v1

  • Adds encrypted sync, cloud backup, micropayment-backed hosted compute (e.g., heavy reranking), and optional hosted pgvector search.
  • Server sees ciphertext plus minimal metadata; hosted search is opt-in and may store embeddings either encrypted or plaintext only by explicit consent.
  • Per-namespace tenancy and isolation (RLS when using Postgres).

3) Key & Auth Model

  • Users may only retain authentication/secret-store access; Kompanion handles day-to-day operations.
  • Device enrollment shares/wraps keys securely (mechanism TBD; QR/device handoff).
  • Key rotation and export are first-class; backups are always encrypted client-side.

4) Search Modes

  • Lexical: FTS + trigram, scoped to namespace/thread/user; grep-like snippets.
  • Semantic: vector ANN with local reranker by default.
  • Hybrid: configurable orchestration; always respects scope and privacy flags.

5) Privacy Controls

  • Sensitivity flags: metadata.sensitivity = secret|private|normal.
  • secret items: E2EE only (no FTS, no embeddings).
  • Server-side scope injection (namespace/user) in all handlers; default-deny posture.
  • Purge policy: soft-delete + scheduled hard-delete; cascades to chunks/embeddings and remote copies.

6) Compatibility with Postgres+pgvector

  • When cloud search is enabled, a hosted Postgres+pgvector instance enforces isolation via RLS and per-namespace session GUCs.
  • Local SQLite store remains the source of truth unless user opts to delegate search to cloud.

Action List (from privacy review)

  1. DB hardening (cloud path): add RLS policies; add FTS + pg_trgm; unique (namespace_id, key); partial ANN indexes per model.
  2. Server enforcement: inject namespace/user via session context (GUCs); default-deny widening; rate limits.
  3. Redaction pipeline: protect secrets before embedding; skip embedding/FTS for secret items.
  4. Private vault mode: key-only retrieval paths for sensitive items (no index participation).
  5. Backups: define E2EE export/import format; provider adapters (e.g., Google Drive) use pre-encrypted blobs.
  6. Sync: event-log format (append-only); conflict rules; device enrollment + key wrapping; later CRDT if needed.
  7. Purging: scheduled hard-deletes; admin "nuke namespace/user" procedure.
  8. Tests: cross-tenant leakage, redaction invariants, purge/TTL, hybrid-vs-lexical, hosted-vs-local parity.

Files to Watch

  • docs/db-schema.md, sql/pg/001_init.sql (cloud path)
  • src/mcp/ToolSchemas.json and MCP handlers (scope + sensitivity gates)
  • kom.local.v1.backup.*, kom.cloud.v1.* (new tool surfaces)