ProbeMetricProbeMetric
ProductHow it worksCompareDocsPricing
Sign inStart free
v1.0 · now in early access

Observability
for AI Agents.

Production AI agents fail silently — infinite loops, runaway costs, broken tool calls. ProbeMetric catches every failure in real time, before your users do.

Start monitoring free →View docs
<10ms
SDK overhead
5 min
Setup time
100%
OTel native
agent.ts● live
import { wrapAnthropic } from 'probemetric';
import Anthropic from '@anthropic-ai/sdk';

// drop-in wrap — all calls traced automatically
const client = wrapAnthropic(new Anthropic(), {
  apiKey: process.env.PROBEMETRIC_API_KEY,
});
trace_7a3fspans: 5 · 4.21s · $0.41
Trusted by engineers shipping with
AnthropicOpenAILangChainCrewAIGeminiAutoGPT
×Live trace

See every span, every failure.

Nested execution trees with timing, tokens, and tool payloads — rendered the moment your agent runs.

Agent Run · trace_7a3f live
⚠CRITICALllm_call #2 returned 500 · cost $0.41ALRT-214
Total Time
4.21s
Tokens
24.8k
Cost
$0.41
Spans
5
agent_run
4.21s
●
├─ llm_call
1.84s
✓
├─ tool: search
610ms
✓
├─ tool: read_file
290ms
✓
└─ llm_call #2
1.12s
✗ 500
0ms1s2s3s4.2s
×How it works

Up and running in 5 minutes.

One wrapper. Full visibility. No infra to manage.

Step 0101

Install the SDK

$ npm install probemetric

TypeScript / Node.js SDK. Supports OpenAI, Anthropic, and Gemini.

Step 0202

Wrap your LLM client

import { wrapAnthropic } from 'probemetric';

const client = wrapAnthropic(
  new Anthropic(), { apiKey }
);

Drop-in wrapper — zero code changes to your agent logic.

Step 0303

Inspect traces

const trace = await probemetric
  .getTrace(traceId);
console.log(trace.spans, trace.cost);

Nested execution trees with cost, latency, and token counts.

×Capabilities

Instrument once. See everything.

Built for teams debugging production AI — not toy demos.

Trace viewer

Execution trees, not flat logs

14 spans · 2 failed tools · 842ms avg latency

Cost dashboard

Token tracking, model-attributed

Daily spend: $124.50 · 1.2M tokens/day

Smart alerts

Threshold + anomaly detection

⚠ loop · planner_agentnow
cost spike · $4.822m
p95 latency · 1.4s8m

5 active rules · Slack · PagerDuty · Webhooks

×Why probemetric

Built for the agentic era.

Traditional APM tools were built for microservices. ProbeMetric was built for autonomous reasoning.

Capability
Traditional APM
ProbeMetric
Trace visualization
✕Flat logs, missing context
✓Nested execution trees with tool payloads
Cost attribution
✕Total monthly bill only
✓Real-time cost per session & model
Agent reliability
✕Detects crashes, not bad logic
✓Detects runaway loops & hallucination patterns
Developer experience
✕Manual instrumentation
✓1-line SDK wrap for any LLM logic
×Pricing

Simple Pricing.

Start for free, scale with your production volume.

5K Events/mo

Free

$0

For solo developers and early experiments.

  • 5K Traces / month
  • 14-day retention
  • 1 Project
  • Email alerts
Start Free
Unlimited Events

Starter

$29/mo

For small teams shipping to production.

  • 100K traces/mo
  • 30-day retention
  • 3 projects
  • 1 Team seat
  • Evals support
Get Starter
Custom RetentionMost Popular

Pro

$99/mo

For growing teams with production workloads.

  • 1M traces/mo
  • 90-day retention
  • Unlimited Projects
  • 2 Team seats
  • Export + Webhooks
  • Priority Email Support
Get Pro
Enterprise Scale

Scale

$299/mo

For organizations with serious scale.

  • 10M traces/mo
  • 365-day retention
  • Priority support
  • 3 Team seats
  • Custom contracts
Get Scale
No credit card required
Cancel anytime
Production-ready from day one

Pricing questions

A trace is one complete agent execution — from the initial prompt through all tool calls and sub-steps. Batched operations count as individual traces.

Absolutely. Upgrade or downgrade at any time. We'll prorate the difference and apply it to your next billing cycle.

We'll notify you at 80% and 100% usage. After that, traces are buffered for 24 hours so you never lose observability during spikes.

×Live dashboard

Every agent. Every trace. One view.

Latency waterfalls, token spend, error rates, and loop detection — all in a single dashboard built for production AI workloads.

ProbeMetric/productionLive
TracesCostsAlertsEvals
Tokens / min
42.1k
+12%
Active agents
12
3 ● loop
Error rate
0.3%
-0.1%
P95 latency
842ms
+38ms
AgentTokensDurationCostStatus
research_agent18.2k1.2s$0.024✓ ok
code_reviewer31.0k3.1s$0.041✓ ok
vector_search6.1k500ms$0.008✗ err
email_drafter8.4k0.8s$0.011✓ ok
planner_agent142k12.4s$0.187⟳ loop
⚠CRITICALLoop detected in planner_agent — 142k tokens consumed, auto-kill triggeredLoop · planner_agent · auto-killedALRT-214
×Enterprise-grade security

Zero-trust prompts.
Local PII redaction.

Your users' data security is paramount. ProbeMetric's SDK supports field-level, regex-based client-side redaction. API keys, credentials, and sensitive PII never leave your infrastructure — maintaining compliance with SOC 2, GDPR, and HIPAA.

SOC 2 Type IIGDPRHIPAASelf-hosted
Local
only

Redaction is processed client-side. Sensitive data never reaches our servers.

Zero
retention

Choose ephemeral mode — traces evaluated in-flight, then discarded.

×Use cases

Wherever your agents go off-script.

Autonomous agents

Catch infinite loops before they bankrupt you

Detect runaway planners and auto-kill agents that exceed cost or step budgets.

RAG pipelines

Pinpoint slow retrieval and bad context

Span-level visibility into vector queries, rerankers, and prompt assembly.

Multi-tool workflows

Debug tool calls without printf-debugging

Inspect every tool input, output, and error payload across nested chains.

Evals & QA

Replay production traces in your eval suite

Export failing traces directly into your regression and eval pipelines.

×Integrations

Works with your stack.

Framework-agnostic. Drop into raw SDK calls or your favorite agent library.

OpenAI
Anthropic
Gemini
Mistral
LangChain
LlamaIndex
CrewAI
AutoGPT
Vercel AI
OTel
Slack
PagerDuty
Datadog
Webhooks
Grafana
Sentry
2,841
traces/sec

ingested across production workloads

37%
avg reduction

in monthly LLM spend after week one

<10ms
overhead

p99 SDK latency on every wrapped call

×FAQ

Common questions.

LangSmith is great if you're all-in on LangChain. ProbeMetric is framework-agnostic — it works with raw OpenAI calls, LangChain, CrewAI, AutoGPT, or any custom agent architecture. You get the same depth of tracing without locking into one ecosystem.

Yes. Our wrappers cover Assistants, Responses, and Chat Completions, including streaming, tool calls, and parallel runs.

OpenAI, Anthropic, Gemini, Mistral, plus LangChain, LlamaIndex, CrewAI, AutoGPT, and Vercel AI SDK out of the box. Anything OTel-compatible just works.

Per event (span), with generous free tier. No per-seat lock-in. Volume tiers scale linearly so you always know what next month looks like.

Yes — set thresholds per agent, per model, or globally. Route alerts to Slack, PagerDuty, or any webhook.

Less than 10ms p99 SDK overhead. Spans are batched and shipped asynchronously off the request path.

Field-level client-side redaction, zero-data retention options, SOC 2 Type II, GDPR, and HIPAA compliant.

Yes — on-prem and VPC deployments are supported for enterprise plans.

Ready when you are

Start monitoring
your agents today.

5-minute setup. No infra to manage. No credit card required.

Get started for freeSign in to dashboard
ProbeMetricProbeMetric

Observability built for autonomous reasoning. Catch every failure before your users do.

All systems operational
Product
  • Trace Viewer
  • Cost Dashboard
  • Alerts
  • Evals
  • Integrations
Developers
  • Docs
  • SDK Reference
  • Changelog
  • Status
  • GitHub
Company
  • About
  • Blog
  • Customers
  • Careers
  • Contact
Legal
  • Privacy
  • Terms
  • Security
  • SOC 2
  • DPA
© 2026 ProbeMetric Inc. — Built for the agentic era.v1.0.0 · made for engineers