metal-kompanion/docs/kompendium-sdk.md

13 KiB
Raw Blame History

Kompanion AI Client SDK for Qt/KDE — API Review & v2 Proposal

Context Existing code under alpaka/src/core implements a minimal LLM client named KLLM:

  • KLLMInterface (central object, Qt-Network based, Ollama URL field, model list, systemPrompt, getCompletion() / getModelInfo()).
  • KLLMRequest (message, model, context).
  • KLLMReply (streaming, finished, basic timing info, context carry-over).

Goal Evolve this into a first-class Kompanion SDK that can power:

  • agentic coding (tool/function calling, plan→execute),
  • app integrations (Kontact, Konsole, KDevelop/Kate, Plasma applets, NeoChat),
  • privacy and policy controls (per-source ACLs, consent),
  • reliable async/streaming/cancellation,
  • multi-backend (Ollama/OpenAI/local engines) with uniform semantics,
  • QML-friendly usage.

Part A — Review (What works / Whats missing)

Strengths

  • Idiomatic Qt API (QObject, signals/slots).
  • Central interface (KLLMInterface) mirrors QNetworkAccessManager/QtNetwork feeling.
  • Streaming via KLLMReply::contentAdded() and completion via finished().
  • Simple model enumeration + systemPrompt.

Gaps to close

  1. Message structure: Only a single message string; no roles (system/user/assistant/tool), no multi-turn thread assembly besides a custom KLLMContext.
  2. Tool calling / function calling: No schema for tool specs, invocation events, results injection, or “plan” steps.
  3. Backend abstraction: “Ollama URL” is a property of the core interface. Needs pluggable providers with capability discovery.
  4. Error model: Only errorOccurred(QString) and hasError. Missing typed errors, retry/cancel semantics, timeouts, throttling.
  5. Observability: Some timing info, but no per-token hooks, token usage counters, logs, traces.
  6. Threading & cancellation: No unified cancel token; no QFuture/QCoro or QPromise integration.
  7. QML friendliness: Usable, but message/tool specs should be modelled as Q_GADGET/Q_OBJECT types and Q_PROPERTY-exposed to QML.
  8. Privacy & policy: No ACLs, no data origin policy, no redaction hooks.
  9. Embeddings / RAG: No first-class embedding calls, no JSON-mode or structured outputs with validators.
  10. Agent loop affordances: No “plan→confirm→apply patch / run tests” pattern built-in; no diff/patch helpers.

Part B — v2 API Proposal (“KompanionAI”)

Rename the public surface to KompanionAI (KI = “Künstliche Intelligenz” fits DE nicely), keep binary compatibility fences internally if needed.

Namespaces & modules

  • namespace KompanionAI { … }

  • Core modules:

    • Client (front door)
    • Provider (backend plugins: Ollama, OpenAI, Local)
    • Message / Thread (roles + history)
    • Tool (function calling schema)
    • Completion (text/chat)
    • Embedding (vectorize)
    • Policy (privacy/ACL)
    • Events (streaming tokens, tool calls, traces)

All classes are Qt types with signals/slots & QML types.


1) Message & Thread Model

// Roles & content parts, QML-friendly
class KIMessagePart {
    Q_GADGET
    Q_PROPERTY(QString mime READ mime)
    Q_PROPERTY(QString text READ text) // for text/plain
    // future: binary, image refs, etc.
  public:
    QString mime;   // "text/plain", "application/json"
    QString text;
};

class KIMessage {
    Q_GADGET
    Q_PROPERTY(QString role READ role) // "system" | "user" | "assistant" | "tool"
    Q_PROPERTY(QList<KIMessagePart> parts READ parts)
  public:
    QString role;
    QList<KIMessagePart> parts;
    QVariantMap metadata; // arbitrary
};

class KIThread {
    Q_GADGET
    Q_PROPERTY(QList<KIMessage> messages READ messages)
  public:
    QList<KIMessage> messages;
};

Why: Enables multi-turn chat with explicit roles and mixed content (text/JSON). Tool outputs show up as role="tool".


2) Tool / Function Calling

class KIToolParam {
    Q_GADGET
    Q_PROPERTY(QString name READ name)
    Q_PROPERTY(QString type READ type)   // "string","number","boolean","object"... (JSON Schema-lite)
    Q_PROPERTY(bool    required READ required)
    Q_PROPERTY(QVariant defaultValue READ defaultValue)
  public:
    QString name, type;
    bool required = false;
    QVariant defaultValue;
};

class KIToolSpec {
    Q_GADGET
    Q_PROPERTY(QString name READ name)
    Q_PROPERTY(QString description READ description)
    Q_PROPERTY(QList<KIToolParam> params READ params)
  public:
    QString name, description;
    QList<KIToolParam> params; // JSON-serializable schema
};

class KIToolCall {
    Q_GADGET
    Q_PROPERTY(QString name READ name)
    Q_PROPERTY(QVariantMap arguments READ arguments)
  public:
    QString name;
    QVariantMap arguments;
};

class KIToolResult {
    Q_GADGET
    Q_PROPERTY(QString name READ name)
    Q_PROPERTY(QVariant result READ result) // result payload (JSON-like)
  public:
    QString name;
    QVariant result;
};

Flow: Model emits a tool call event → client executes tool → emits tool result → model continues. All observable via signals.


3) Provider abstraction (multi-backend)

class KIProvider : public QObject {
    Q_OBJECT
    Q_PROPERTY(QString name READ name CONSTANT)
    Q_PROPERTY(QStringList models READ models NOTIFY modelsChanged)
    Q_PROPERTY(KICapabilities caps READ caps CONSTANT)
  public:
    virtual QFuture<KIReply*> chat(const KIThread& thread, const KIChatOptions& opts) = 0;
    virtual QFuture<KIEmbeddingResult> embed(const QStringList& texts, const KIEmbedOptions& opts) = 0;
    // ...
};

class KIClient : public QObject {
    Q_OBJECT
    Q_PROPERTY(KIProvider* provider READ provider WRITE setProvider NOTIFY providerChanged)
    Q_PROPERTY(QString defaultModel READ defaultModel WRITE setDefaultModel NOTIFY defaultModelChanged)
  public:
    Q_INVOKABLE QFuture<KIReply*> chat(const KIThread&, const KIChatOptions&);
    Q_INVOKABLE QFuture<KIEmbeddingResult> embed(const QStringList&, const KIEmbedOptions&);
    Q_INVOKABLE void cancel(quint64 requestId);
    // ...
};
  • OllamaProvider, OpenAIProvider, LocalProvider implement KIProvider.
  • KICapabilities advertises support for: JSON-mode, function calling, system prompts, logprobs, images, etc.
  • Do not bake “Ollama URL” into Client. It belongs to the provider.

4) Completion / Reply / Streaming Events

class KIReply : public QObject {
    Q_OBJECT
    Q_PROPERTY(bool finished READ isFinished NOTIFY finishedChanged)
    Q_PROPERTY(int  promptTokens READ promptTokens CONSTANT)
    Q_PROPERTY(int  completionTokens READ completionTokens CONSTANT)
    Q_PROPERTY(QString model READ model CONSTANT)
  public:
    // accumulated assistant text
    Q_INVOKABLE QString text() const;

  Q_SIGNALS:
    void tokensAdded(const QString& delta);              // streaming text
    void toolCallProposed(const KIToolCall& call);       // model proposes a tool call
    void toolResultRequested(const KIToolCall& call);    // alt: unified request
    void traceEvent(const QVariantMap& span);            // observability
    void finished();                                     // reply done
    void errorOccurred(const KIError& error);
};

Why: makes tool invocation first-class and observable. You can wire it to ACF/MCP tools or project introspection.


5) Options / Policies / Privacy

class KIChatOptions {
    Q_GADGET
    Q_PROPERTY(QString model MEMBER model)
    Q_PROPERTY(bool stream MEMBER stream)
    Q_PROPERTY(bool jsonMode MEMBER jsonMode)
    Q_PROPERTY(int  maxTokens MEMBER maxTokens)
    Q_PROPERTY(double temperature MEMBER temperature)
    Q_PROPERTY(QList<KIToolSpec> tools MEMBER tools) // permitted tool set for this call
    Q_PROPERTY(KIPolicy policy MEMBER policy)
    // ...
  public:
    QString model; bool stream = true; bool jsonMode = false;
    int maxTokens = 512; double temperature = 0.2;
    QList<KIToolSpec> tools;
    KIPolicy policy;
};

class KIPolicy {
    Q_GADGET
    Q_PROPERTY(QString visibility MEMBER visibility) // "private|org|public"
    Q_PROPERTY(bool allowNetwork MEMBER allowNetwork)
    Q_PROPERTY(QStringList redactions MEMBER redactions) // regex keys to redact
    // future: per-source ACLs
  public:
    QString visibility = "private";
    bool allowNetwork = false;
    QStringList redactions;
};

Why: explicit control of what the agent may do; dovetails with your HDoD memory ACLs.


6) Embeddings (for RAG / memory)

class KIEmbedOptions {
    Q_GADGET
    Q_PROPERTY(QString model MEMBER model)
    Q_PROPERTY(QString normalize MEMBER normalize) // "l2"|"none"
  public:
    QString model = "text-embed-local";
    QString normalize = "l2";
};

class KIEmbeddingResult {
    Q_GADGET
  public:
    QVector<QVector<float>> vectors;
    QString model;
};

Why: unify vector generation; Kompanion memory can plug this directly.


7) Agent Loop Conveniences (optional helpers)

Provide “batteries included” patterns outside the Provider:

class KIAgent : public QObject {
    Q_OBJECT
  public:
    // Plan → (approve) → Execute, with tool calling enabled
    Q_SIGNAL void planReady(const QString& plan);
    Q_SIGNAL void patchReady(const QString& unifiedDiff);
    Q_SIGNAL void needToolResult(const KIToolCall& call);
    Q_SIGNAL void log(const QString& msg);

    void runTask(const QString& naturalInstruction,
                 const KIThread& prior,
                 const QList<KIToolSpec>& tools,
                 const KIChatOptions& opts);
};

This helper emits “plan first”, then “diff/patch proposals”, integrates with your ACF and KTextEditor/KDevelop diff panes.


8) Error Model & Cancellation

  • Introduce KIError{ code: enum, httpStatus, message, retryAfter }.
  • KIClient::cancel(requestId) cancels in-flight work.
  • Timeouts & retry policy configurable in KIChatOptions.

9) QML exposure

Register these with qmlRegisterType<…>("Kompanion.AI", 1, 0, "KIClient") etc. Expose KIMessage, KIThread, KIToolSpec, KIAgent to QML, so Plasma applets / Kirigami UIs can wire flows fast.


Part C — Migration from KLLM to KompanionAI

  • Mapping

    • KLLMInterface::getCompletion()KIClient::chat(thread, opts).
    • KLLMRequest{message, model, context}KIThread{ messages=[system?, user?], … }, KIChatOptions{model}.
    • KLLMReplyKIReply (adds tool call signals, token deltas, errors).
    • systemPrompt → first system KIMessage.
    • models()KIProvider::models().
  • Providers

    • Implement OllamaProvider first (parity with current).
    • Add OpenAIProvider (JSON-mode/function calling), LocalProvider (llama.cpp/candle/etc.).
  • Binary/Source compatibility

    • You can keep thin wrappers named KLLM* forwarding into KompanionAI during transition.

Part D — Minimal Examples

1) Simple chat with streaming and tool calling

KIClient client;
client.setProvider(new OllamaProvider(QUrl("http://localhost:11434")));
client.setDefaultModel("llama3.1:8b-instruct");

KIThread t;
t.messages << KIMessage{ .role="system", .parts={ { "text/plain","You are Kompanion inside KDE." } } }
           << KIMessage{ .role="user",   .parts={ { "text/plain","Generate a CSV → exam report plan." } } };

KIToolSpec csvSpec;
csvSpec.name = "parse_csv_schema";
csvSpec.description = "Inspect a CSV sample path and return column info.";
csvSpec.params = { { "path","string", true, {} } };

KIChatOptions opts;
opts.tools = { csvSpec };
opts.stream = true;

auto *reply = client.chat(t, opts).result(); // or connect via QFutureWatcher
QObject::connect(reply, &KIReply::tokensAdded, [](const QString& d){ qDebug() << d; });
QObject::connect(reply, &KIReply::toolCallProposed, [&](const KIToolCall& call){
  if (call.name == "parse_csv_schema") {
    QVariantMap out; out["columns"] = QStringList{ "Name","Grade","Subject" };
    // feed result back (provider-specific or via KIClient API)
    client.returnToolResult(*reply, KIToolResult{ call.name, out });
  }
});

2) Embeddings for memory

auto emb = client.embed({ "RAII pattern", "SFINAE", "Type erasure" }, KIEmbedOptions{}).result();
qDebug() << emb.vectors.size(); // 3

Part E — Optional Extensions (for Gemini to consider)

  • Structured outputs: JSON schema validation for function outputs (reject invalid JSON, request fix).
  • Safety hooks: pre-execution validators for tool calls (e.g. forbid dangerous shell).
  • Observability: OpenTelemetry spans over request lifecycle and tool calls.
  • Rate limiting: token budgeters per provider.
  • Offline mode: allowNetwork=false forces model to abstain from external lookups.
  • Crash handler integration: a helper that consumes backtraces and emits a KIThread pre-filled with stack/context (pairs naturally with an ACF tool to fetch symbols).
  • CSV app generator: a thin template tool that scaffolds a Kirigami app, fed by CSV schema tool—end-to-end demo of agentic coding.

TL;DR

  • Keep Qt idioms; elevate to KompanionAI with roles, tools, providers, and policies.
  • Make tool calling first-class with observable events.
  • Decouple backend specifics via KIProvider.
  • Add embeddings & JSON-mode for RAG + structured tasks.
  • Provide agent loop helpers (plan→diff→apply) outside the provider.
  • Expose everything to QML for KDE-native UIs.

This gives you a future-proof client SDK that plugs directly into Kontact/Konsole/KDevelop/Plasma/NeoChat and supports your ACF/MCP agent flows without locking into any single vendor.