5.7 KiB
5.7 KiB
AnythingLLM ↔ Kompanion Memory Compatibility Evaluation
Current Kompanion Memory Stack
- Primary store: Postgres 14+ with
pgvector≥ 0.6, accessed via the C++PgDalimplementation (embeddings,memory_chunks,memory_items,namespacestables). Each embedding row keepsid,chunk_id,model,dim,vector, and anormalizedflag. - Chunking & metadata: Items are broken into chunks; embeddings attach to chunks via
chunk_id. Item metadata lives as structured JSON onmemory_itemswith tags, TTL, and revision controls. - Namespace model: Logical scopes (e.g.
project:user:thread) are first-class rows. Retrieval joins embeddings back to items to recover text + metadata. - Fallback mode: Local-only path uses SQLite plus a FAISS sidecar (see
docs/MEMORY.md) but the production design assumes Postgres.
AnythingLLM Vector Stack (PGVector path)
- Supports multiple vector backends; the overlapping option is
pgvector(server/utils/vectorDbProviders/pgvector/index.js). - Expects a single table (default
anythingllm_vectors) shaped as{ id UUID, namespace TEXT, embedding vector(n), metadata JSONB, created_at TIMESTAMP }. - Metadata is stored inline as JSONB; namespace strings are arbitrary workspace slugs. The embed dimension is fixed per table at creation time.
- The NodeJS runtime manages chunking, caching, and namespace hygiene, and assumes CRUD against that flat table.
Key Differences
- Schema shape: Kompanion splits data across normalized tables with foreign keys; AnythingLLM uses a single wide table per vector store. Kompanion’s embeddings currently lack a JSONB metadata column and instead rely on joins.
- Identifiers: Kompanion embeddings key off
chunk_id(uuid/text) plusmodel; AnythingLLM expects a uniqueidper stored chunk and does not expose the underlying chunk relationship. - Metadata transport: Kompanion keeps tags/TTL in
memory_items(JSON) and chunk text inmemory_chunks. AnythingLLM packs metadata (including document references and source identifiers) directly into the vector row’s JSONB. - Lifecycle hooks: Kompanion enforces sensitivity flags before embedding; AnythingLLM assumes documents are already filtered and will happily ingest any chunk. Deletion flows differ (Kompanion uses soft-delete semantics; AnythingLLM issues hard deletes by namespace/document).
- Embeddings contract: Kompanion records embedding model and dimension per row; AnythingLLM fixes dimension at table creation and stores model choice in JSON metadata.
Compatibility Plan
-
Agree on a shared pgvector table
- Create (or reuse) a Postgres schema reachable by both systems.
- Define a composite view or materialized view that maps
embeddings+memory_chunks+memory_itemsinto theanythingLLMlayout (columns:id,namespace,embedding,metadata,created_at). - Add a JSONB projection that captures Kompanion metadata (
chunk_id,item_id,tags,model,revision, sensitivity flags). This becomes themetadatafield for AnythingLLM.
-
Write a synchronization job
- Option A: database triggers on
embeddingsto insert/update a mirror row inanythingllm_vectors. - Option B: periodic worker that scans for new/updated embeddings (
revisionorupdated_at) and upserts into the shared table through SQL. - Ensure deletions (soft or hard) propagate by expiring mirrored rows or respecting a
deleted_atflag in metadata (AnythingLLM supports document purges via namespace filtering).
- Option A: database triggers on
-
Normalize namespace semantics
- Reuse Kompanion’s namespace string as the AnythingLLM workspace slug.
- Document mapping rules (e.g. replace
:with_if AnythingLLM slugs disallow colons). - Provide a compatibility map in metadata so both systems resolve back to Kompanion’s canonical namespace identity.
-
Unify embedding models
- Select a shared embedding model (e.g.,
text-embedding-3-largeor local Nomic). - Record the chosen model in the mirrored metadata and enforce dimension on the
anythingllm_vectorstable creation. - Update Kompanion’s embedding pipeline to fail fast if the produced dimension differs from the table’s fixed size.
- Select a shared embedding model (e.g.,
-
Expose retrieval APIs
- For Kompanion → AnythingLLM: implement a thin adapter that reads from the shared table instead of internal joins when responding to AnythingLLM requests (or simply let AnythingLLM talk directly to Postgres).
- For AnythingLLM → Kompanion: ensure the metadata payload includes the necessary identifiers (
item_id,chunk_id) so Kompanion can resolve back to full context.
-
Security & sensitivity handling
- Extend the metadata JSON to include Kompanion’s sensitivity/embeddable flags.
- Patch AnythingLLM ingestion to respect a
sensitivitykey (skip or mask secrets) before inserting into its table, or filter at the view level so secret rows never surface.
-
Validation & tooling
- Add a migration checklist covering table creation, index alignment (
USING ivfflat), and permission grants for the AnythingLLM service role. - Create integration tests that:
- Upsert an item in Kompanion.
- Confirm mirrored row appears in
anythingllm_vectors. - Query through AnythingLLM API and verify the same chunk text + metadata round-trips.
- Add a migration checklist covering table creation, index alignment (
Near-Term Tasks
- Draft SQL for the projection view/materialized view, including JSONB assembly.
- Prototype a synchronization worker (Python or C++) that mirrors embeddings into the AnythingLLM table.
- Define namespace slug normalization rules and document them in both repos.
- Coordinate on embedding model selection and update configuration in both stacks.
- Add automated compatibility tests to CI pipelines of both projects.