Snapshot commit

This commit is contained in:
Χγφτ Kompanion 2025-10-15 10:14:58 +13:00
parent 5ae22ff9f8
commit b567b51ee2
5 changed files with 98 additions and 5 deletions

View File

@ -17,3 +17,9 @@
- Add libpq (or pqxx) and parameterized statements.
- Add RLS/session GUCs & retries.
## Implementation Checklist (2025-10-15)
- Detect `libpq` + `pqxx` in CMake, define `HAVE_PG`, and link `kom_dal` against `PostgreSQL::PostgreSQL` and `pqxx::pqxx`.
- During `PgDal::connect`, prepare statements for namespace ensure/upsert, chunk/embedding insert, text/vector search, and hybrid search.
- Guard runtime selection: `stub://` DSN keeps the current in-memory store; non-stub DSNs require a live Postgres connection.
- Expose environment variables in docs: `PG_DSN` (full libpq connection string) and optional `PGSSLMODE`.
- Surface informative `std::runtime_error` messages when pqxx operations fail so MCP handlers can emit actionable errors.

View File

@ -36,6 +36,12 @@ Precompute embeddings for recent items.
- input: `{ namespace: string, since?: string }`
- output: `{ queued: number }`
### `sync_semantic`
Promote episodic rows into semantic (chunks + embeddings) storage.
- input: `{ namespace: string, max_batch?: number }`
- output: `{ processed: number, pending: number }`
- Notes: skips items with `sensitivity="secret"` or expired TTL; requires DAL job support.
## Auth & Versioning
- toolNamespace: `kom.memory.v1`
- auth: bearer token via MCP session metadata (optional local mode).

View File

@ -0,0 +1,69 @@
# Memory Architecture Roadmap (2025-10-15)
## Current Snapshot
- `PgDal` remains an in-memory stub; `libpq`/`pqxx` are not linked and there is no `HAVE_PG` compile guard.
- `contract_memory` now builds via `tests/contract_memory.cpp` as a dedicated CTest executable (stub DAL still backing it).
- MCP handlers directly invoke the stub DAL and perform ad-hoc JSON parsing.
- No `resources/` descriptors exist for episodic memory APIs or the semantic sync workflow.
## 1. CTest Target: `contract_memory`
1. Keep `contract_memory.cpp` focused on exercising `PgDal` write/read surfaces; expand as DAL features land.
2. Ensure the executable runs without Postgres by defaulting to `stub://memory` when `PG_DSN` is absent.
3. Layer follow-up assertions once pqxx-backed code exists (e.g., detect `HAVE_PG` and connect to real DB in CI).
## 2. DAL Prepared Statements (`HAVE_PG`)
**Dependencies**
- System packages: `libpq` headers and `libpqxx` (>= 7.8).
- CMake: `find_package(PostgreSQL REQUIRED)` and `find_package(pqxx REQUIRED)`; link `PostgreSQL::PostgreSQL` and `pqxx::pqxx`.
**Implementation Steps**
1. Extend `src/dal/CMakeLists.txt` to define `HAVE_PG` and link the libraries when detection succeeds.
2. Enhance `PgDal` to manage both modes:
- `stub://` DSN → in-memory containers (status quo).
- Other DSNs → establish `pqxx::connection`, store it via `std::unique_ptr`, and call `conn->prepare(...)` during `connect`.
3. Prepare statements for: `ensure_namespace`, `upsert_item`, `insert_chunk`, `insert_embedding`, `search_text`, `search_vector`, `hybrid_search`.
4. Implement transactional helpers using `pqxx::work`, with RAII to guarantee `commit()`/`abort()` pairing.
5. Translate `pqxx` exceptions into `std::runtime_error` with context so MCP handlers can emit useful error JSON.
6. Document required environment variables (`PG_DSN`, optional `PGSSLMODE`) plus migration expectations in `docs/dal-skeleton.md`.
## 3. MCP `resources/*` & Episodic→Semantic Sync
**Directory Layout**
- Create `resources/memory/kom.memory.v1/` for tool descriptors and schema fragments:
- `episodic.json` raw conversation timeline.
- `semantic.json` chunked embeddings metadata.
- `jobs/semantic_sync.json` background job contract.
**Design Highlights**
1. Episodic resource fields: `namespace`, `thread_id`, `speaker`, `content`, `sensitivity`, `tags`, `created_at`.
2. Semantic resource references episodic items (`episodic_id`, `chunk_id`, `model`, `dim`, `vector_ref`).
3. DAL sync job flow:
- Locate episodic rows with `embedding_status='pending'` (and `sensitivity!='secret'`).
- Batch call embedder(s); write `memory_chunks` + `embeddings`.
- Mark episodic rows as `embedding_status='done'`, capture audit entries (e.g., ledger append).
4. Expose a placeholder MCP tool `kom.memory.v1.sync_semantic` that enqueues or executes the job.
5. Note TTL and privacy requirements; skip items with `expires_at` in the past or flagged secret.
## 4. `hybrid_search_v1` with `pgvector`
**SQL Components**
1. Update migrations (`sql/pg/001_init.sql`) to include:
- `tsvector` generated column or expression for lexical search.
- `GIN` index on the lexical field (either `to_tsvector` or `pg_trgm`).
- Per-model `ivfflat` index on `embeddings.vector`.
2. Prepared statements:
- Text: `SELECT id, ts_rank_cd(...) AS score FROM memory_items ... WHERE namespace_id=$1 AND text_query=$2 LIMIT $3`.
- Vector: `SELECT item_id, 1 - (vector <=> $2::vector) AS score FROM embeddings ... WHERE namespace_id=$1 ORDER BY vector <-> $2 LIMIT $3`.
3. Merge results in C++ with Reciprocal Rank Fusion or weighted sum, ensuring deterministic ordering on ties.
**Handler Integration**
1. Extend `PgDal::hybridSearch` to dispatch to the prepared statements when `HAVE_PG` is defined; reuse in-memory fallback otherwise.
2. Return richer matches (id, score, optional chunk text) to satisfy MCP response schema.
3. Update `HandlersMemory::search_memory` to surface the new scores and annotate whether lexical/vector contributed (optional metadata).
4. Add a contract test scenario once pqxx-backed execution is available (requires live Postgres fixture later).
## Next-Step Checklist
- [ ] Detect pqxx via CMake and plumb `HAVE_PG`.
- [x] Normalize contract_memory CTest target and remove stale library target.
- [ ] Author `resources/memory/` descriptors and sync job outline.
- [ ] Extend DAL header to carry prepared-statement aware APIs (may introduce new structs).
- [x] Update `docs/mcp-memory-api.md` to mention episodic sync + hybrid search fields.
- [ ] Create follow-up acf subtasks when concrete implementation begins (pgvector migration, scheduler hook, runtime wiring).

View File

@ -6,8 +6,10 @@ target_link_libraries(test_mcp_tools PRIVATE kom_dal)
add_test(NAME contract_mcp_tools COMMAND test_mcp_tools)
add_library(contract_memory STATIC
add_executable(contract_memory
contract_memory.cpp
)
target_include_directories(contract_memory PRIVATE ${PROJECT_SOURCE_DIR}/src)
target_link_libraries(contract_memory PRIVATE kom_dal)
add_test(NAME contract_memory COMMAND contract_memory)

View File

@ -1,5 +1,6 @@
#include "dal/PgDal.hpp"
#include <iostream>
#include <string>
#include <vector>
@ -36,7 +37,16 @@ static void contract_pgdal_basic() {
static_cast<void>(dal.hybridSearch(embedding.vector, "stub-model", "tests", "chunk", 5));
}
static const bool contract_pgdal_compiles = [] {
int main() {
try {
contract_pgdal_basic();
return true;
}();
std::cout << "contract_ok\n";
return 0;
} catch (const std::exception& ex) {
std::cerr << "contract_memory failure: " << ex.what() << "\n";
return 1;
} catch (...) {
std::cerr << "contract_memory failure: unknown error\n";
return 1;
}
}