Inference Passthrough

Every Kernle user with a working AI application already has a model. Inference passthrough eliminates the need to configure a separate model binding for Kernle — your existing model “just works.”

How It Works

Kernle supports two integration patterns, each with its own inference source:

Pattern	Inference Source	Setup Required
MCP server	Host agent’s model via MCP sampling	None (automatic)
Library embedding	Your existing `generate()` function	One line

Model Binding Priority

When a tool is called, Kernle binds a model in this order:

Explicit model — if you’ve already called k.entity.set_model(), that model is used
Persisted config — if you ran kernle model set, that config is loaded
MCP sampling — if the MCP client supports sampling, the host agent’s model is used
Capture-only — no model available; memory capture works, inference-dependent features are skipped

MCP Integration (Automatic)

If your MCP client supports the sampling capability (Claude Code, Claude Desktop), Kernle automatically uses the host agent’s model for inference. No configuration needed.

# Just start the server — no model config required
kernle -s my-agent mcp

The MCP server detects sampling support at the first tool call and binds it transparently. You’ll see this in logs:

INFO: Bound MCP sampling model (via host agent)

If the client doesn’t support sampling, Kernle operates in capture-only mode — capture-tier operations (episodes, notes, raw) still work, but identity-tier operations (beliefs, values, goals, drives, relationships) raise InferenceRequiredError, and inference-dependent features like emotion detection and contradiction finding are skipped.

Library Embedding (CallableModelAdapter)

For Python library users, wrap any (prompt, system) -> str callable:

from kernle import Kernle, CallableModelAdapter

# Your existing generate function
def my_generate(prompt: str, system: str | None) -> str:
    return my_llm_client.complete(prompt, system_prompt=system)

# One line to connect it
k = Kernle(stack_id="my-agent")
k.entity.set_model(
    CallableModelAdapter(my_generate, model_id="gpt-4o", provider="openai")
)

# Now all Kernle features use your model
k.raw("I learned something important today")

Message Flattening

CallableModelAdapter flattens the message list into a single prompt string:

USER: first message

ASSISTANT: response

USER: follow-up

This is intentionally simple. If you need full message-structure fidelity (tool calls, structured content), implement ModelProtocol directly instead.

No Model? Capture Still Works

Kernle operates in two tiers when no model is bound:

Capture tier — raw(), episode(), note() always work without a model
Identity tier — belief(), value(), goal(), drive(), relationship() require a bound inference model and raise InferenceRequiredError without one

Inference-dependent features return safe defaults:

Feature	Without Model
`raw()`	Works normally
`episode()`	Works normally
`note()`	Works normally
`belief()`, `value()`, `goal()`, `drive()`, `relationship()`	Raises `InferenceRequiredError`
Emotion detection	Returns neutral (valence=0, arousal=0)
Contradiction detection	Returns empty list
Suggestion extraction	Returns empty list

Exports

CallableModelAdapter is available at the top level:

from kernle import CallableModelAdapter

SamplingModelAdapter is internal to the MCP server and not exported at the package top level. The server instantiates it automatically when sampling is available.

​Inference Passthrough

​How It Works

​Model Binding Priority

​MCP Integration (Automatic)

​Library Embedding (CallableModelAdapter)

​Message Flattening

​No Model? Capture Still Works

​Exports