v1.0 · now in early access

Observability
for AI Agents.

Production AI agents fail silently — infinite loops, runaway costs, broken tool calls. ProbeMetric catches every failure in real time, before your users do.

Start monitoring free →View docs

<10ms

SDK overhead

5 min

Setup time

100%

OTel native

agent.ts● live

import { wrapAnthropic } from 'probemetric';
import Anthropic from '@anthropic-ai/sdk';

// drop-in wrap — all calls traced automatically
const client = wrapAnthropic(new Anthropic(), {
  apiKey: process.env.PROBEMETRIC_API_KEY,
});

trace_7a3fspans: 5 · 4.21s · $0.41

Trusted by engineers shipping with

AnthropicOpenAILangChainCrewAIGeminiAutoGPT

×Live trace

See every span, every failure.

Nested execution trees with timing, tokens, and tool payloads — rendered the moment your agent runs.

Agent Run · trace_7a3f live

⚠CRITICALllm_call #2 returned 500 · cost $0.41ALRT-214

Total Time

4.21s

Tokens

24.8k

Cost

$0.41

Spans

agent_run

4.21s

●

├─ llm_call

1.84s

✓

├─ tool: search

610ms

✓

├─ tool: read_file

290ms

✓

└─ llm_call #2

1.12s

✗ 500

0ms1s2s3s4.2s

×How it works

Up and running in 5 minutes.

One wrapper. Full visibility. No infra to manage.

Step 0101

Install the SDK

$ npm install probemetric

TypeScript / Node.js SDK. Supports OpenAI, Anthropic, and Gemini.

Step 0202

Wrap your LLM client

import { wrapAnthropic } from 'probemetric';

const client = wrapAnthropic(
  new Anthropic(), { apiKey }
);

Drop-in wrapper — zero code changes to your agent logic.

Step 0303

Inspect traces

const trace = await probemetric
  .getTrace(traceId);
console.log(trace.spans, trace.cost);

Nested execution trees with cost, latency, and token counts.

×Capabilities

Instrument once. See everything.

Built for teams debugging production AI — not toy demos.

Trace viewer

Execution trees, not flat logs

14 spans · 2 failed tools · 842ms avg latency

Cost dashboard

Token tracking, model-attributed

Daily spend: $124.50 · 1.2M tokens/day

Smart alerts

Threshold + anomaly detection

⚠ loop · planner_agentnow

cost spike · $4.822m

p95 latency · 1.4s8m

5 active rules · Slack · PagerDuty · Webhooks

×Why probemetric

Built for the agentic era.

Traditional APM tools were built for microservices. ProbeMetric was built for autonomous reasoning.

Capability

Traditional APM

ProbeMetric

Trace visualization

✕Flat logs, missing context

✓Nested execution trees with tool payloads

Cost attribution

✕Total monthly bill only

✓Real-time cost per session & model

Agent reliability

✕Detects crashes, not bad logic

✓Detects runaway loops & hallucination patterns

Developer experience

✕Manual instrumentation

✓1-line SDK wrap for any LLM logic

×Pricing

Simple Pricing.

Start for free, scale with your production volume.

5K Events/mo

Free

For solo developers and early experiments.

5K Traces / month
14-day retention
1 Project
Email alerts

Start Free

Unlimited Events

Starter

$29/mo

For small teams shipping to production.

100K traces/mo
30-day retention
3 projects
1 Team seat
Evals support

Get Starter

Custom RetentionMost Popular

Pro

$99/mo

For growing teams with production workloads.

1M traces/mo
90-day retention
Unlimited Projects
2 Team seats
Export + Webhooks
Priority Email Support

Get Pro

Enterprise Scale

Scale

$299/mo

For organizations with serious scale.

10M traces/mo
365-day retention
Priority support
3 Team seats
Custom contracts

Get Scale

No credit card required

Cancel anytime

Production-ready from day one

Pricing questions

A trace is one complete agent execution — from the initial prompt through all tool calls and sub-steps. Batched operations count as individual traces.

Absolutely. Upgrade or downgrade at any time. We'll prorate the difference and apply it to your next billing cycle.

We'll notify you at 80% and 100% usage. After that, traces are buffered for 24 hours so you never lose observability during spikes.

×Live dashboard

Every agent. Every trace. One view.

Latency waterfalls, token spend, error rates, and loop detection — all in a single dashboard built for production AI workloads.

ProbeMetric/productionLive

TracesCostsAlertsEvals

Tokens / min

42.1k

+12%

Active agents

3 ● loop

Error rate

0.3%

-0.1%

P95 latency

842ms

+38ms

Agent	Tokens	Duration	Cost	Status
research_agent	18.2k	1.2s	$0.024	✓ ok
code_reviewer	31.0k	3.1s	$0.041	✓ ok
vector_search	6.1k	500ms	$0.008	✗ err
email_drafter	8.4k	0.8s	$0.011	✓ ok
planner_agent	142k	12.4s	$0.187	⟳ loop

⚠CRITICALLoop detected in planner_agent — 142k tokens consumed, auto-kill triggeredLoop · planner_agent · auto-killedALRT-214

×Enterprise-grade security

Zero-trust prompts.
Local PII redaction.

Your users' data security is paramount. ProbeMetric's SDK supports field-level, regex-based client-side redaction. API keys, credentials, and sensitive PII never leave your infrastructure — maintaining compliance with SOC 2, GDPR, and HIPAA.

SOC 2 Type IIGDPRHIPAASelf-hosted

Local

only

Redaction is processed client-side. Sensitive data never reaches our servers.

Zero

retention

Choose ephemeral mode — traces evaluated in-flight, then discarded.

×Use cases

Wherever your agents go off-script.

Autonomous agents

Catch infinite loops before they bankrupt you

Detect runaway planners and auto-kill agents that exceed cost or step budgets.

RAG pipelines

Pinpoint slow retrieval and bad context

Span-level visibility into vector queries, rerankers, and prompt assembly.

Multi-tool workflows

Debug tool calls without printf-debugging

Inspect every tool input, output, and error payload across nested chains.

Evals & QA

Replay production traces in your eval suite

Export failing traces directly into your regression and eval pipelines.

×Integrations

Works with your stack.

Framework-agnostic. Drop into raw SDK calls or your favorite agent library.

OpenAI

Anthropic

Gemini

Mistral

LangChain

LlamaIndex

CrewAI

AutoGPT

Vercel AI

OTel

Slack

PagerDuty

Datadog

Webhooks

Grafana

Sentry

2,841

traces/sec

ingested across production workloads

37%

avg reduction

in monthly LLM spend after week one

<10ms

overhead

p99 SDK latency on every wrapped call

×FAQ

Common questions.

LangSmith is great if you're all-in on LangChain. ProbeMetric is framework-agnostic — it works with raw OpenAI calls, LangChain, CrewAI, AutoGPT, or any custom agent architecture. You get the same depth of tracing without locking into one ecosystem.

Yes. Our wrappers cover Assistants, Responses, and Chat Completions, including streaming, tool calls, and parallel runs.

OpenAI, Anthropic, Gemini, Mistral, plus LangChain, LlamaIndex, CrewAI, AutoGPT, and Vercel AI SDK out of the box. Anything OTel-compatible just works.

Per event (span), with generous free tier. No per-seat lock-in. Volume tiers scale linearly so you always know what next month looks like.

Yes — set thresholds per agent, per model, or globally. Route alerts to Slack, PagerDuty, or any webhook.

Less than 10ms p99 SDK overhead. Spans are batched and shipped asynchronously off the request path.

Field-level client-side redaction, zero-data retention options, SOC 2 Type II, GDPR, and HIPAA compliant.

Yes — on-prem and VPC deployments are supported for enterprise plans.

Ready when you are

Start monitoring
your agents today.

5-minute setup. No infra to manage. No credit card required.

Get started for free Sign in to dashboard

import { wrapAnthropic } from 'probemetric'; import Anthropic from '@anthropic-ai/sdk'; // drop-in wrap — all calls traced automatically const client = wrapAnthropic(new Anthropic(), { apiKey: process.env.PROBEMETRIC_API_KEY, });

Agent

Tokens

Duration

Cost

Status

research_agent

18.2k

1.2s

$0.024

✓ ok

code_reviewer

31.0k

3.1s

$0.041

✓ ok

vector_search

6.1k

500ms

$0.008

✗ err

email_drafter

8.4k

0.8s

$0.011

✓ ok

planner_agent

142k

12.4s

$0.187

⟳ loop

Observabilityfor AI Agents.

See every span, every failure.

Up and running in 5 minutes.

Install the SDK

Wrap your LLM client

Inspect traces

Instrument once. See everything.

Execution trees, not flat logs

Token tracking, model-attributed

Threshold + anomaly detection

Built for the agentic era.

Simple Pricing.

Free

Starter

Pro

Scale

Pricing questions

Every agent. Every trace. One view.

Zero-trust prompts.Local PII redaction.

Wherever your agents go off-script.

Catch infinite loops before they bankrupt you

Pinpoint slow retrieval and bad context

Debug tool calls without printf-debugging

Replay production traces in your eval suite

Works with your stack.

Common questions.

Start monitoringyour agents today.

Observabilityfor AI Agents.

See every span, every failure.

Up and running in 5 minutes.

Install the SDK

Wrap your LLM client

Inspect traces

Instrument once. See everything.

Execution trees, not flat logs

Token tracking, model-attributed

Threshold + anomaly detection

Built for the agentic era.

Simple Pricing.

Free

Starter

Pro

Scale

Pricing questions

Every agent. Every trace. One view.

Zero-trust prompts.Local PII redaction.

Wherever your agents go off-script.

Catch infinite loops before they bankrupt you

Pinpoint slow retrieval and bad context

Debug tool calls without printf-debugging

Replay production traces in your eval suite

Works with your stack.

Common questions.

Start monitoringyour agents today.

Observability
for AI Agents.

Zero-trust prompts.
Local PII redaction.

Start monitoring
your agents today.

Observability
for AI Agents.

Zero-trust prompts.
Local PII redaction.

Start monitoring
your agents today.