Agent Ops Monitor

Operational cockpit for monitoring autonomous agent infrastructure health, service metrics, and incident history.

Features

  • Signal Feed: Real-time alerts with escalation + acknowledgement controls
  • Service Metrics: Keep thresholds and trends visible next to active incidents
  • Incident Timeline: Show latest actions for quick war-room context
  • Environment Aware: Swap context (production vs staging) via props

Examples

Production Overview

Production infrastructure

Operations command center

Uptime 99.97%
3 signals

Live signals

Agent routed

Latency spike in us-east

Critical

P95 latency increased from 410ms → 930ms over the last 5 minutes

Owner · Agent Atlas2m ago

Drift detected in embedding model

Warning

Sentiment accuracy decreased by 4.8 pts vs. reference dataset

Owner · Agent Mira8m ago

Backup pipeline

Healthy

Nightly backups completed · 0 errors

Owner · Agent NovaCompleted

Service metrics

Targets
P95 latencyTarget < 550ms
930ms
Error rateTarget < 1%
0.8%
Agent availabilityTarget > 99.9%
99.7%
Latest incidents
Failover to eu-west orchestrator06:42

Action · Monitor

Customer impact flagged for tier-1 accounts06:55

Action · CS notified

Incident Response

Production infrastructure

Operations command center

Uptime 99.92%
3 signals

Live signals

Agent routed

Vector db replica lag

Critical

Replica lag is 28s (> 10s threshold). Agents throttled to 60 req/min

Owner · Agent Atlas1m ago

API 500s

Warning

Tool execution route returning 2.6% 500 errors for tier-1 customers

Owner · Agent Pulse3m ago

Backup pipeline

Healthy

Nightly snapshot completed. Verifying restore checksums

Owner · Agent NovaCompleted

Service metrics

Targets
P95 latencyTarget < 550ms
780ms
Error rateTarget < 1%
1.2%
Agent availabilityTarget > 99.9%
99.3%
Latest incidents
Triggered read-only mode06:30

Action · Monitor lag

Routed backups to secondary region06:42

Action · Confirm ETL status

Staging Sandbox

Staging infrastructure

Operations command center

Uptime 99.5%
2 signals

Live signals

Agent routed

Staging data refresh

Warning

Refresh pending security approval. Tokens expire in 4h

Owner · Agent RelayPending

Preview environment

Healthy

Preview deployment succeeded · build #812

Owner · Agent ForgeNow

Service metrics

Targets
Deploy successTarget > 95%
97.1%
Preview latencyTarget < 600ms
420ms
QA coverageTarget > 80%
84%
Latest incidents

No recent incidents.

Installation

npx shadcn add "https://agents-ui-kit.com/c/agent-ops-monitor.json"

API Reference

AgentOpsMonitor

PropTypeDefaultDescription
environmentstring"Production"Environment name in header
uptimestring"99.97%"Uptime badge
signalsOpsSignal[]Sample signalsLive alerts feed
metricsOpsServiceMetric[]Sample metricsService metrics cards
incidentsIncidentEvent[]Sample incidentsRecent incident timeline
onAcknowledge(signal) => void-Called when acknowledging a signal
onEscalate(signal) => void-Called when escalating
onExportReport() => void-Fired by export button
classNamestring-Tailwind overrides

Types

interface OpsSignal {
  id: string
  title: string
  status: "healthy" | "warning" | "critical"
  detail: string
  owner?: string
  lastUpdated?: string
}

interface OpsServiceMetric {
  label: string
  value: string
  threshold: string
  trend: "up" | "down" | "stable"
}

interface IncidentEvent {
  id: string
  timestamp: string
  summary: string
  actionNeeded?: string
}

Design Notes

  • Signal list wraps in ScrollArea for consistent height in docs
  • Buttons follow h-7 sizing for secondary actions
  • Status badges reuse blue/emerald/amber/red palette established across the kit