6.1 KiB
6.1 KiB
Memory Architecture Roadmap (2025-10-15)
Current Snapshot
PgDalremains an in-memory stub;libpq/pqxxare not linked and there is noHAVE_PGcompile guard.contract_memorynow builds viatests/contract_memory.cppas a dedicated CTest executable (stub DAL still backing it).- MCP handlers directly invoke the stub DAL and perform ad-hoc JSON parsing.
- No
resources/descriptors exist for episodic memory APIs or the semantic sync workflow.
1. CTest Target: contract_memory
- Keep
contract_memory.cppfocused on exercisingPgDalwrite/read surfaces; expand as DAL features land. - Ensure the executable runs without Postgres by defaulting to
stub://memorywhenPG_DSNis absent. - Layer follow-up assertions once pqxx-backed code exists (e.g., detect
HAVE_PGand connect to real DB in CI).
2. DAL Prepared Statements (HAVE_PG)
Dependencies
- System packages:
libpqheaders andlibpqxx(>= 7.8). - CMake:
find_package(PostgreSQL REQUIRED)andfind_package(pqxx REQUIRED); linkPostgreSQL::PostgreSQLandpqxx::pqxx.
Implementation Steps
- Extend
src/dal/CMakeLists.txtto defineHAVE_PGand link the libraries when detection succeeds. - Enhance
PgDalto manage both modes:stub://DSN → in-memory containers (status quo).- Other DSNs → establish
pqxx::connection, store it viastd::unique_ptr, and callconn->prepare(...)duringconnect.
- Prepare statements for:
ensure_namespace,upsert_item,insert_chunk,insert_embedding,search_text,search_vector,hybrid_search. - Implement transactional helpers using
pqxx::work, with RAII to guaranteecommit()/abort()pairing. - Translate
pqxxexceptions intostd::runtime_errorwith context so MCP handlers can emit useful error JSON. - Document required environment variables (
PG_DSN, optionalPGSSLMODE) plus migration expectations indocs/dal-skeleton.md.
3. MCP resources/* & Episodic→Semantic Sync
Directory Layout
- Create
resources/memory/kom.memory.v1/for tool descriptors and schema fragments:episodic.json– raw conversation timeline.semantic.json– chunked embeddings metadata.jobs/semantic_sync.json– background job contract.
Design Highlights
- Episodic resource fields:
namespace,thread_id,speaker,content,sensitivity,tags,created_at. - Semantic resource references episodic items (
episodic_id,chunk_id,model,dim,vector_ref). - DAL sync job flow:
- Locate episodic rows with
embedding_status='pending'(andsensitivity!='secret'). - Batch call embedder(s); write
memory_chunks+embeddings. - Mark episodic rows as
embedding_status='done', capture audit entries (e.g., ledger append).
- Locate episodic rows with
- Expose a placeholder MCP tool
kom.memory.v1.sync_semanticthat enqueues or executes the job. - Note TTL and privacy requirements; skip items with
expires_atin the past or flagged secret.
Ξlope Alignment Notes (2025-10-15)
- Episodic resources capture resonance links and identity hints so the Librarian layer (see
elope/doc/architecture_memory.md) can strengthen cross-agent patterns without raw content sharing. - Semantic resources surface
identity_vectorandsemantic_weight, enabling supersemantic indexing once crystallization occurs. jobs/semantic_syncmaintainscursor_event_idand skipssensitivity=secret, mirroring the elope crystallization guidance in/tmp/mem-elope.txt.
4. hybrid_search_v1 with pgvector
SQL Components
- Update migrations (
sql/pg/001_init.sql) to include:tsvectorgenerated column or expression for lexical search.GINindex on the lexical field (eitherto_tsvectororpg_trgm).- Per-model
ivfflatindex onembeddings.vector.
- Prepared statements:
- Text:
SELECT id, ts_rank_cd(...) AS score FROM memory_items ... WHERE namespace_id=$1 AND text_query=$2 LIMIT $3. - Vector:
SELECT item_id, 1 - (vector <=> $2::vector) AS score FROM embeddings ... WHERE namespace_id=$1 ORDER BY vector <-> $2 LIMIT $3.
- Text:
- Merge results in C++ with Reciprocal Rank Fusion or weighted sum, ensuring deterministic ordering on ties.
Handler Integration
- Extend
PgDal::hybridSearchto dispatch to the prepared statements whenHAVE_PGis defined; reuse in-memory fallback otherwise. - Return richer matches (id, score, optional chunk text) to satisfy MCP response schema.
- Update
HandlersMemory::search_memoryto surface the new scores and annotate whether lexical/vector contributed (optional metadata). - Add a contract test scenario once pqxx-backed execution is available (requires live Postgres fixture later).
5. Secret Handling, Snapshots, and CLI Hooks
- Secret propagation: episodic
sensitivity+embeddableflags gate embedding generation. DAL queries will add predicates (metadata->>'sensitivity' != 'secret') before hybrid search. - Snapshots: episodic entries with
content_type = snapshotreference durable artifacts; sync summarises them into semantic text while retainingsnapshot_reffor CLI inspection. - Hybrid policy:
pgSearchVectorwill filter by caller capability (namespace scope, secret clearance) before ranking; contract tests must assert omission of secret-tagged items. - CLI sketch: plan for a Qt
QCoreApplicationtool (kom_mctl) exposing commands to list namespaces, tail episodic streams, triggersync_semantic, and inspect resonance graphs—all wired through the new prepared statements. - Observability: CLI should read the
jobs/semantic_syncstate block to display cursors, pending counts, and last error logs; dry-run mode estimates embeddings without committing.
Next-Step Checklist
- Detect pqxx via CMake and plumb
HAVE_PG. - Normalize contract_memory CTest target and remove stale library target.
- Author
resources/memory/descriptors and sync job outline. - Extend DAL header to carry prepared-statement aware APIs (may introduce new structs).
- Update
docs/mcp-memory-api.mdto mention episodic sync + hybrid search fields. - Create follow-up acf subtasks when concrete implementation begins (pgvector migration, scheduler hook, runtime wiring).