ModelProtocol — The Thinking Engine
The model is the entity’s thinking engine. It generates text, embeds vectors, and drives behavior. Swap the model and the entity thinks differently — that’s a feature, not a limitation.What Models Do
A model provider wraps an external API or local inference engine behind a standard interface. The rest of the system never talks to Anthropic or Ollama directly — it talks toModelProtocol, and the model handles the translation.
The Entity creates an InferenceService wrapper around the model and passes it down to stack components. Components never see ModelProtocol directly — they get the narrow InferenceService interface.
ModelProtocol Interface
Supporting Types
| Type | Purpose |
|---|---|
ModelCapabilities | What the model supports: context window, max tokens, tool use, vision, streaming |
ModelMessage | A conversation message with role, content, optional tool calls |
ModelResponse | Complete response: content, tool calls, usage stats, stop reason |
ModelChunk | A streaming chunk: partial content, is_final flag, usage on final chunk |
ToolDefinition | A tool the model can call: name, description, JSON schema, handler |
InferenceService
Stack components don’t need the fullModelProtocol. They need two things: generate text and embed vectors. InferenceService is that narrow interface.
How InferenceService Works
When you callentity.set_model(model), the Entity:
- Stores the model reference
- Creates an
_InferenceServiceImplwrapping it - Calls
stack.on_model_changed(inference)on the active stack - The stack propagates to all components via
component.set_inference(inference)
_InferenceServiceImpl routes infer() to model.generate() and uses a local HashEmbedder for embed() by default.
HashEmbedder Fallback
When no model-level embedding is available, the system uses a local hash-based n-gram embedder. This provides functional (if lower-quality) semantic search without any external API:- Provider ID:
ngram-v1 - No network calls required
- Deterministic (same input always produces same vector)
- Lower quality than model-based embeddings, but always available
Built-in Implementations
AnthropicModel
Wraps the Anthropic Python SDK for Claude models.- Explicit
api_key=parameter CLAUDE_API_KEYenvironment variableANTHROPIC_API_KEYenvironment variable
- 200k context window
- Tool use, vision, streaming
- System messages extracted to top-level
systemparameter (Anthropic API convention)
pip install anthropic or pip install kernle[anthropic]
OllamaModel
Connects to a local Ollama instance via HTTP REST API.- Configurable context window (default 8192)
- Streaming support
- No tool use or vision (model-dependent)
pip install requests and a running Ollama server
Entry Point Registration
Model implementations register in thekernle.models entry point group:
discover_models() and can instantiate them on demand.
Building a Custom Model Provider
To integrate a new model provider, implement theModelProtocol interface:
Auto-Configuration
For CLI usage, Kernle can auto-configure a model from environment variables. Theprocess run and process exhaust commands do this automatically. Detection priority:
KERNLE_MODEL_PROVIDERenv var (forces a specific provider)CLAUDE_API_KEYorANTHROPIC_API_KEY→ AnthropicOPENAI_API_KEY→ OpenAI- No key → graceful degradation (no model)
KERNLE_MODEL: