# Memory Architecture Roadmap (2025-10-15) ## Current Snapshot - `PgDal` remains an in-memory stub; `libpq`/`pqxx` are not linked and there is no `HAVE_PG` compile guard. - `contract_memory` now builds via `tests/contract_memory.cpp` as a dedicated CTest executable (stub DAL still backing it). - MCP handlers directly invoke the stub DAL and perform ad-hoc JSON parsing. - No `resources/` descriptors exist for episodic memory APIs or the semantic sync workflow. ## 1. CTest Target: `contract_memory` 1. Keep `contract_memory.cpp` focused on exercising `PgDal` write/read surfaces; expand as DAL features land. 2. Ensure the executable runs without Postgres by defaulting to `stub://memory` when `PG_DSN` is absent. 3. Layer follow-up assertions once pqxx-backed code exists (e.g., detect `HAVE_PG` and connect to real DB in CI). ## 2. DAL Prepared Statements (`HAVE_PG`) **Dependencies** - System packages: `libpq` headers and `libpqxx` (>= 7.8). - CMake: `find_package(PostgreSQL REQUIRED)` and `find_package(pqxx REQUIRED)`; link `PostgreSQL::PostgreSQL` and `pqxx::pqxx`. **Implementation Steps** 1. Extend `src/dal/CMakeLists.txt` to define `HAVE_PG` and link the libraries when detection succeeds. 2. Enhance `PgDal` to manage both modes: - `stub://` DSN → in-memory containers (status quo). - Other DSNs → establish `pqxx::connection`, store it via `std::unique_ptr`, and call `conn->prepare(...)` during `connect`. 3. Prepare statements for: `ensure_namespace`, `upsert_item`, `insert_chunk`, `insert_embedding`, `search_text`, `search_vector`, `hybrid_search`. 4. Implement transactional helpers using `pqxx::work`, with RAII to guarantee `commit()`/`abort()` pairing. 5. Translate `pqxx` exceptions into `std::runtime_error` with context so MCP handlers can emit useful error JSON. 6. Document required environment variables (`PG_DSN`, optional `PGSSLMODE`) plus migration expectations in `docs/dal-skeleton.md`. ## 3. MCP `resources/*` & Episodic→Semantic Sync **Directory Layout** - Create `resources/memory/kom.memory.v1/` for tool descriptors and schema fragments: - `episodic.json` – raw conversation timeline. - `semantic.json` – chunked embeddings metadata. - `jobs/semantic_sync.json` – background job contract. **Design Highlights** 1. Episodic resource fields: `namespace`, `thread_id`, `speaker`, `content`, `sensitivity`, `tags`, `created_at`. 2. Semantic resource references episodic items (`episodic_id`, `chunk_id`, `model`, `dim`, `vector_ref`). 3. DAL sync job flow: - Locate episodic rows with `embedding_status='pending'` (and `sensitivity!='secret'`). - Batch call embedder(s); write `memory_chunks` + `embeddings`. - Mark episodic rows as `embedding_status='done'`, capture audit entries (e.g., ledger append). 4. Expose a placeholder MCP tool `kom.memory.v1.sync_semantic` that enqueues or executes the job. 5. Note TTL and privacy requirements; skip items with `expires_at` in the past or flagged secret. **Ξlope Alignment Notes (2025-10-15)** - Episodic resources capture resonance links and identity hints so the Librarian layer (see `elope/doc/architecture_memory.md`) can strengthen cross-agent patterns without raw content sharing. - Semantic resources surface `identity_vector` and `semantic_weight`, enabling supersemantic indexing once crystallization occurs. - `jobs/semantic_sync` maintains `cursor_event_id` and skips `sensitivity=secret`, mirroring the elope crystallization guidance in `/tmp/mem-elope.txt`. ## 4. `hybrid_search_v1` with `pgvector` **SQL Components** 1. Update migrations (`sql/pg/001_init.sql`) to include: - `tsvector` generated column or expression for lexical search. - `GIN` index on the lexical field (either `to_tsvector` or `pg_trgm`). - Per-model `ivfflat` index on `embeddings.vector`. 2. Prepared statements: - Text: `SELECT id, ts_rank_cd(...) AS score FROM memory_items ... WHERE namespace_id=$1 AND text_query=$2 LIMIT $3`. - Vector: `SELECT item_id, 1 - (vector <=> $2::vector) AS score FROM embeddings ... WHERE namespace_id=$1 ORDER BY vector <-> $2 LIMIT $3`. 3. Merge results in C++ with Reciprocal Rank Fusion or weighted sum, ensuring deterministic ordering on ties. **Handler Integration** 1. Extend `PgDal::hybridSearch` to dispatch to the prepared statements when `HAVE_PG` is defined; reuse in-memory fallback otherwise. 2. Return richer matches (id, score, optional chunk text) to satisfy MCP response schema. 3. Update `HandlersMemory::search_memory` to surface the new scores and annotate whether lexical/vector contributed (optional metadata). 4. Add a contract test scenario once pqxx-backed execution is available (requires live Postgres fixture later). ## 5. Secret Handling, Snapshots, and CLI Hooks - **Secret propagation**: episodic `sensitivity` + `embeddable` flags gate embedding generation. DAL queries will add predicates (`metadata->>'sensitivity' != 'secret'`) before hybrid search. - **Snapshots**: episodic entries with `content_type = snapshot` reference durable artifacts; sync summarises them into semantic text while retaining `snapshot_ref` for CLI inspection. - **Hybrid policy**: `pgSearchVector` will filter by caller capability (namespace scope, secret clearance) before ranking; contract tests must assert omission of secret-tagged items. - **CLI sketch**: plan for a Qt `QCoreApplication` tool (`kom_mctl`) exposing commands to list namespaces, tail episodic streams, trigger `sync_semantic`, and inspect resonance graphs—all wired through the new prepared statements. - **Observability**: CLI should read the `jobs/semantic_sync` state block to display cursors, pending counts, and last error logs; dry-run mode estimates embeddings without committing. - **Activation parity**: Long term, mirror the KDE `akonadiclient`/`akonadi-console` pattern—Kompanion CLI doubles as an MCP surface today and later as a DBus-activated helper so tools can be socket-triggered into the memory service. - **KConfig defaults**: `kom_mcp` and `kompanion` load `Database/PgDsn` from `~/.config/kompanionrc` (see `docs/configuration.md`) when `PG_DSN` is unset, keeping deployments kioskable. ## Next-Step Checklist - [x] Detect pqxx via CMake and plumb `HAVE_PG`. - [x] Normalize contract_memory CTest target and remove stale library target. - [ ] Author `resources/memory/` descriptors and sync job outline. - [ ] Extend DAL header to carry prepared-statement aware APIs (may introduce new structs). - [x] Update `docs/mcp-memory-api.md` to mention episodic sync + hybrid search fields. - [ ] Create follow-up acf subtasks when concrete implementation begins (pgvector migration, scheduler hook, runtime wiring).