Most turns need 2–3 tools. You're sending 24.
selection observability drift + retrain evals guardrails

Large tool catalogs increase
routing ambiguity.

Every additional tool increases prompt cost and reasoning overhead.

Kerf scores your catalog per turn and reduces it to a small per-turn working set, before the call, in under 10ms, with no external cloud dependency in the selection path.

Train locallyExport the selectorRun in-process
// TRACE // end-to-end, one turn

Every decision ships with a readable trace.

Score, threshold, rationale, outcome. Pipe it to your logger, your dashboard, or your stdout.

~/agents/support-bot/ $ kerf trace --tail --pretty
trace #4821streaming
14:02:11.142 INFO kerf.runtime init project="proj_x7k_support" tools=24
14:02:11.142 DEBUG "my order #4821 hasn't shipped — cancel it before it goes out."
14:02:11.143 DEBUG kerf.intent detected intent=order_management conf=0.96
14:02:11.144 DEBUG kerf.signal verb_cue "cancel" → promote=cancel_order
14:02:11.145 DEBUG kerf.signal entity ref=order#4821 → prereq=get_order_status
14:02:11.146 DEBUG kerf.signal no_match "support page is gone" → surface_to=analytics
14:02:11.147 INFO kerf.score threshold=0.72 scored=24 selected=2 dropped=22
14:02:11.148 DEBUG kerf.score get_order_status=0.94 cancel_order=0.87
14:02:11.148 DEBUG kerf.score refund_order=0.61 (below threshold, dropped)
14:02:11.149 DEBUG kerf.guard cancel_order → policy=allow scope=write
14:02:11.149 DEBUG kerf.guard get_order_status → policy=allow scope=read
14:02:11.150 INFO kerf.emit otlp_span=sel_4821 duration=8.2ms
14:02:11.150 DEBUG kerf.payload tools=[get_order_status, cancel_order]
14:02:11.151 RESULT kerf.return tools=2 tokens=~2.1k (illustrative) took=~9 ms

Illustrative output. Actual results vary by catalog size and query complexity.

// BENCHMARKS // τ-bench adapted workflows

Reproducible evals. Every commit.

Methodology: adapted τ-bench workflows using custom tool catalogs with production-style agent policies. Each row represents a distinct workflow scenario with routing instrumentation enabled.

BenchmarkModelToolsAvg SelectedInvalid Calls ↓Task Success ↑Token Reduction ↓Latency
τ-bench* · airline workflowsgpt-4o142.4−41%+3 pts−38%8ms
τ-bench* · retail workflowsclaude-sonnet-4212.8−57%+4 pts−44%9ms
τ-bench* · support operationsgpt-4o-mini242.6−68%+6 pts−59%9ms
τ-bench* · developer agentclaude-opus-41184.9−61%+3 pts−48%14ms
τ-bench* · ops runbooksgpt-4o3126.7−54%+2 pts−37%21ms

*Results are representative examples for illustration purposes, not official τ-bench scores. We expect 40–60% token reduction in catalog-heavy workflows. We're looking for design partners to validate these results.

Kerf/Playground
User intent
U

"Where is my order #4821? Also, can you cancel it?"

All tools · 24
list_orders
get_order_status
cancel_order
refund_order
search_kb
get_customer
create_ticket
escalate
list_shipments
track_package
update_address
+ 13 more
kerf
~10ms
Selected tools · 2
get_order_status
Fetch order #4821 details
cancel_order
Cancel order if eligible
Dropped · 22
refund_order
search_kb
get_customer
create_ticket
escalate
list_shipments
track_package
update_address
list_orders
+ 13 more
demo
12:04:21→ 3 tools9ms12:04:19→ 2 tools11ms12:04:18→ 4 tools13ms12:04:16→ 2 tools8ms12:04:14→ 3 tools10ms

OpenAI-spec compatible · runs in your VPC · no data leaves your infra.

// CAPABILITIES // what kerf actually does

Six layers, one install.

01 · SELECTION

Large tool catalogs reduced to a small per-turn working set.

A selection model scores your full catalog and returns only the highest-scoring tools for the current turn. In-process, single-digit to low-double-digit ms latency. Kerf minimizes irrelevant tool exposure per turn.

full catalogper-turn working setin ~10ms
02 · OBSERVABILITY

Every selection decision is traceable.

Structured reasoning, scores, and an OTLP span per decision. Stream to Datadog, Honeycomb, Grafana — or kerf trace --tail in any shell.

03 · DRIFT + RETRAIN

Catch drift before your users do.

Continuous monitoring of selection confidence across production traffic. When drift crosses your threshold, kerf flags it — designed for fast retraining cycles.

04 · EVALS

Reproducible, on every change.

Planned eval suite: τ-bench workflows, custom scenarios, and regression checks. Every change runs against held-out traces and regression scenarios. CI integration planned.

05 · RUNTIME

Optional runtime orchestration for agent execution.

Optional kerf.run() drives the full loop — planning, tool calls, retries, fallback. Or stay surgical and only use the selection call.

06 · GUARDRAILS

Enforce policy before execution.

Per-tool, per-route policy. Designed to support PII scrubbing, argument validation, allow-lists, and rate limits. Designed to surface failures in the same trace.

// DESIGN GOALS // what we cared about

Engineered for production. Ships as a dependency, not a service

01 // RUNTIME

Target: ≤ 10 ms p50 under production workloads.

If kerf adds more latency than a single token costs, it isn't worth running. Selection runs on a lightweight CPU-bound reranker with no external inference calls.

02 // PRIVACY

Your data stays put.

Bootstrapped from tool schemas and optionally refined with local runtime telemetry — never trained on inference traffic. No external inference calls in the hot path. SOC 2 Type II planned.

03 // INTERFACE

OpenAI-compatible tool interface.

Reads the tools[] array you have already written. Returns the same tool schema shape your model provider expects. Compatible with OpenAI-style, Anthropic, Gemini, and MCP tool interfaces.

04 // OBSERVABILITY

OTLP-compatible traces.

Every selection emits an OpenTelemetry span with scores, threshold, and reasoning. Ships to Datadog, Honeycomb, Grafana, or stdout.

05 // LEARNING

No manual annotation pipeline required.

No human labels, no annotation pipeline. The selector bootstraps from your tool schemas alone.

06 // FAILURE MODE

Configurable fallback behavior when confidence drops.

Fallback behavior is configurable per project and route — never silently drops a tool.

// SDK // drop-in, one call

One kerf.select() call.

Provides a drop-in replacement for custom tool routing. Returns the same tools JSON your model already expects. Bindings for Node, Python, and Go are in progress.

  • ·Zero-config. Reads your existing tool specs as-is.
  • ·In-process. No network hop. Designed to run locally with no data leaving your infra.
  • ·Observable. Trace, score, and explanation from the SDK.
  • ·Tested. Built test-first. Coverage reports available to design partners.
main.ts · saved 3s ago
1import { kerf } from "@kerf/sdk"
2import OpenAI from "openai"
3await kerf.init({ projectId: "proj_x7k" })
4const { tools, trace } = await kerf.select(message)
5const res = await openai.chat.completions.create({
6 model: "gpt-4o",
7 messages: message,
8 tools, // ← only what this turn needs
9})
10logger.debug({ trace })
// DEPLOYMENT // run kerf where it makes sense

Library mode ships now. Sidecar & Gateway on the roadmap.

01 · LIBRARY
recommended

In-process

Imports as a library. Runs locally with no additional infrastructure required.

your-service
kerf-sdk
openai-sdk
https
model provider
memory
≤ 64 MB
cold start
< 200 ms
add'l infra
none
isolation
process
02 · SIDECAR
v2k8s native

Sidecar / DaemonSet

Runs as a sidecar container next to your agent service. Same node, localhost only.

your-service
kerf-sidecar
https
model provider
memory
TBD
cold start
TBD
add'l infra
1 container
isolation
container
03 · GATEWAY
v2openai-compat

Drop-in proxy

Self-hosted gateway compatible with OpenAI-style APIs. Zero SDK changes.

any client SDK
kerf-gateway
https
model provider
memory
TBD
throughput
TBD
add'l infra
1 service
isolation
network
// COMPATIBILITY // works with your stack

Native bindings & tested adapters.

OpenAIOpenAInative
LangChainLangChainnative
+HTTP / customnative
MCP protocolMCP protocolnative
kerf

Get early access.

Kerf is in private beta. We're working with a small number of teams to validate selection quality on real catalogs. If this problem is live for you, we'd like to talk.

· No card required· We'll reach out to set up onboarding· Runs in your VPC