Documentation

Comprehensive guide to integrate LLM client telemetry, configure runaway alerts, manage PII redaction, and call APIs.

Introduction & Architecture

ProbeMetric is a high-performance LLM telemetry and observability platform designed specifically for agentic AI workloads. It tracks, indexes, and monitors LLM calls, prompts, responses, costs, token counts, and latencies in real time.

Ingestion Architecture & Queueing Model

ProbeMetric is engineered to handle massive volumes of traces with zero impact on user application latency:

1. Non-blocking SDK Wrappers: The client SDK records prompt inputs, model parameters, and response options, offloading network requests to non-blocking background promises. 2. SQS Ingestion Buffer: Inbound trace payloads sent to the API Gateway are pushed directly to an AWS SQS queue. This ensures sub-millisecond gateway response times and guards against high-volume traffic spikes. 3. Payload Offloading (S3): Payloads larger than 128 KB are offloaded directly to encrypted AWS S3 buckets, with the SQS message only containing the S3 reference path. 4. Direct DB Fallback (Dev Mode): In local development, or when no SQS queue is configured, ProbeMetric automatically routes trace events directly into the database so developers see telemetry immediately. 5. Background Processors: A dedicated consumer daemon pulls events from SQS, evaluates tokens against token-pricing matrices, logs metrics, and checks configuration limits.

info

Note

All traces are encrypted at rest using industry-standard AES-256 keys, and transport routes require TLS (HTTPS) connections.

Ingestion Architecture & Queueing Model

ProbeMetric is engineered to handle massive volumes of traces with zero impact on user application latency:

info

Note

All traces are encrypted at rest using industry-standard AES-256 keys, and transport routes require TLS (HTTPS) connections.