build a8f9d2call systems operational

Most turns need 2–3 tools. You're sending 24.

▸ selection▸ observability▸ drift + retrain▸ evals▸ guardrails

Large tool catalogs increase
routing ambiguity.

Every additional tool increases prompt cost and reasoning overhead.

Kerf scores your catalog per turn and reduces it to a small per-turn working set, before the call, in under 10ms, with no external cloud dependency in the selection path.

Request Access →See Benchmarks

Train locally→Export the selector→Run in-process

// TRACE // end-to-end, one turn

Every decision ships with a readable trace.

Score, threshold, rationale, outcome. Pipe it to your logger, your dashboard, or your stdout.

~/agents/support-bot/ $ kerf trace --tail --pretty

trace #4821streaming

14:02:11.142 INFO kerf.runtime init project="proj_x7k_support" tools=24

14:02:11.142 DEBUG "my order #4821 hasn't shipped — cancel it before it goes out."

14:02:11.143 DEBUG kerf.intent detected intent=order_management conf=0.96

14:02:11.144 DEBUG kerf.signal verb_cue "cancel" → promote=cancel_order

14:02:11.145 DEBUG kerf.signal entity ref=order#4821 → prereq=get_order_status

14:02:11.146 DEBUG kerf.signal no_match "support page is gone" → surface_to=analytics

14:02:11.147 INFO kerf.score threshold=0.72 scored=24 selected=2 dropped=22

14:02:11.148 DEBUG kerf.score get_order_status=0.94 cancel_order=0.87

14:02:11.148 DEBUG kerf.score refund_order=0.61 (below threshold, dropped)

14:02:11.149 DEBUG kerf.guard cancel_order → policy=allow scope=write

14:02:11.149 DEBUG kerf.guard get_order_status → policy=allow scope=read

14:02:11.150 INFO kerf.emit otlp_span=sel_4821 duration=8.2ms

14:02:11.150 DEBUG kerf.payload tools=[get_order_status, cancel_order]

14:02:11.151 RESULT kerf.return tools=2 tokens=~2.1k (illustrative) took=~9 ms

Illustrative output. Actual results vary by catalog size and query complexity.

// BENCHMARKS // τ-bench adapted workflows

Reproducible evals. Every commit.

Methodology: adapted τ-bench workflows using custom tool catalogs with production-style agent policies. Each row represents a distinct workflow scenario with routing instrumentation enabled.

Benchmark	Model	Tools	Avg Selected	Invalid Calls ↓	Task Success ↑	Token Reduction ↓	Latency
τ-bench* · airline workflows	gpt-4o	14	2.4	−41%	+3 pts	−38%	8ms
τ-bench* · retail workflows	claude-sonnet-4	21	2.8	−57%	+4 pts	−44%	9ms
τ-bench* · support operations	gpt-4o-mini	24	2.6	−68%	+6 pts	−59%	9ms
τ-bench* · developer agent	claude-opus-4	118	4.9	−61%	+3 pts	−48%	14ms
τ-bench* · ops runbooks	gpt-4o	312	6.7	−54%	+2 pts	−37%	21ms

*Results are representative examples for illustration purposes, not official τ-bench scores. We expect 40–60% token reduction in catalog-heavy workflows. We're looking for design partners to validate these results.

Kerf/Playground

Connectedsupport-agent

Navigation

Project

proj_x7k_support

24 tools · 3 agents

User intent

"Where is my order #4821? Also, can you cancel it?"

All tools · 24

list_orders

get_order_status

cancel_order

refund_order

search_kb

get_customer

create_ticket

escalate

list_shipments

track_package

update_address

+ 13 more

kerf

~10ms

Selected tools · 2

get_order_status

Fetch order #4821 details

cancel_order

Cancel order if eligible

Dropped · 22

refund_order

search_kb

get_customer

create_ticket

escalate

list_shipments

track_package

update_address

list_orders

+ 13 more

demo

12:04:21→ 3 tools9ms12:04:19→ 2 tools11ms12:04:18→ 4 tools13ms12:04:16→ 2 tools8ms12:04:14→ 3 tools10ms

OpenAI-spec compatible · runs in your VPC · no data leaves your infra.

// CAPABILITIES // what kerf actually does

Six layers, one install.

01 · SELECTION

Large tool catalogs reduced to a small per-turn working set.

A selection model scores your full catalog and returns only the highest-scoring tools for the current turn. In-process, single-digit to low-double-digit ms latency. Kerf minimizes irrelevant tool exposure per turn.

full catalog→per-turn working setin ~10ms

02 · OBSERVABILITY

Every selection decision is traceable.

Structured reasoning, scores, and an OTLP span per decision. Stream to Datadog, Honeycomb, Grafana — or kerf trace --tail in any shell.

03 · DRIFT + RETRAIN

Catch drift before your users do.

Continuous monitoring of selection confidence across production traffic. When drift crosses your threshold, kerf flags it — designed for fast retraining cycles.

04 · EVALS

Reproducible, on every change.

Planned eval suite: τ-bench workflows, custom scenarios, and regression checks. Every change runs against held-out traces and regression scenarios. CI integration planned.

05 · RUNTIME

Optional runtime orchestration for agent execution.

Optional kerf.run() drives the full loop — planning, tool calls, retries, fallback. Or stay surgical and only use the selection call.

06 · GUARDRAILS

Enforce policy before execution.

Per-tool, per-route policy. Designed to support PII scrubbing, argument validation, allow-lists, and rate limits. Designed to surface failures in the same trace.

// DESIGN GOALS // what we cared about

Engineered for production. Ships as a dependency, not a service

01 // RUNTIME

Target: ≤ 10 ms p50 under production workloads.

If kerf adds more latency than a single token costs, it isn't worth running. Selection runs on a lightweight CPU-bound reranker with no external inference calls.

02 // PRIVACY

Your data stays put.

Bootstrapped from tool schemas and optionally refined with local runtime telemetry — never trained on inference traffic. No external inference calls in the hot path. SOC 2 Type II planned.

03 // INTERFACE

OpenAI-compatible tool interface.

Reads the tools[] array you have already written. Returns the same tool schema shape your model provider expects. Compatible with OpenAI-style, Anthropic, Gemini, and MCP tool interfaces.

04 // OBSERVABILITY

OTLP-compatible traces.

Every selection emits an OpenTelemetry span with scores, threshold, and reasoning. Ships to Datadog, Honeycomb, Grafana, or stdout.

05 // LEARNING

No manual annotation pipeline required.

No human labels, no annotation pipeline. The selector bootstraps from your tool schemas alone.

06 // FAILURE MODE

Configurable fallback behavior when confidence drops.

Fallback behavior is configurable per project and route — never silently drops a tool.

// SDK // drop-in, one call

One `kerf.select()` call.

Provides a drop-in replacement for custom tool routing. Returns the same tools JSON your model already expects. Bindings for Node, Python, and Go are in progress.

·Zero-config. Reads your existing tool specs as-is.
·In-process. No network hop. Designed to run locally with no data leaving your infra.
·Observable. Trace, score, and explanation from the SDK.
·Tested. Built test-first. Coverage reports available to design partners.

main.ts · saved 3s ago

1import { kerf } from "@kerf/sdk"
2import OpenAI from "openai"
3await kerf.init({ projectId: "proj_x7k" })
4const { tools, trace } = await kerf.select(message)
5const res = await openai.chat.completions.create({
6  model: "gpt-4o",
7  messages: message,
8  tools, // ← only what this turn needs
9})
10logger.debug({ trace })

// DEPLOYMENT // run kerf where it makes sense

Library mode ships now. Sidecar & Gateway on the roadmap.

01 · LIBRARY

recommended

In-process

Imports as a library. Runs locally with no additional infrastructure required.

your-service

kerf-sdk

openai-sdk

https

model provider

memory

≤ 64 MB

cold start

< 200 ms

add'l infra

none

isolation

process

02 · SIDECAR

v2k8s native

Sidecar / DaemonSet

Runs as a sidecar container next to your agent service. Same node, localhost only.

your-service

kerf-sidecar

https

model provider

memory

TBD

cold start

TBD

add'l infra

1 container

isolation

container

03 · GATEWAY

v2openai-compat

Drop-in proxy

Self-hosted gateway compatible with OpenAI-style APIs. Zero SDK changes.

any client SDK

kerf-gateway

https

model provider

memory

TBD

throughput

TBD

add'l infra

1 service

isolation

network

// COMPATIBILITY // works with your stack

Native bindings & tested adapters.

OpenAInative

LangChainnative

+HTTP / customnative

MCP protocolnative

Get early access.

Kerf is in private beta. We're working with a small number of teams to validate selection quality on real catalogs. If this problem is live for you, we'd like to talk.

· No card required· We'll reach out to set up onboarding· Runs in your VPC

Large tool catalogs increaserouting ambiguity.

Every decision ships with a readable trace.

Reproducible evals. Every commit.

Six layers, one install.

Large tool catalogs reduced to a small per-turn working set.

Every selection decision is traceable.

Catch drift before your users do.

Reproducible, on every change.

Optional runtime orchestration for agent execution.

Enforce policy before execution.

Engineered for production. Ships as a dependency, not a service

Target: ≤ 10 ms p50 under production workloads.

Your data stays put.

OpenAI-compatible tool interface.

OTLP-compatible traces.

No manual annotation pipeline required.

Configurable fallback behavior when confidence drops.

One kerf.select() call.

Library mode ships now. Sidecar & Gateway on the roadmap.

In-process

Sidecar / DaemonSet

Drop-in proxy

Native bindings & tested adapters.

Get early access.

Large tool catalogs increase
routing ambiguity.

One `kerf.select()` call.