Telemetry User Manual
The cloudmesh.ai.common.telemetry module provides a standardized way to record and emit performance metrics, system state, and event data during the execution of AI workloads. This data is essential for benchmarking, debugging, and monitoring the efficiency of AI models.
Overview
The Telemetry class acts as the primary interface for recording data. It supports emitting metrics to various backends (such as JSONL files or SQLite databases), allowing for both real-time monitoring and post-hoc analysis.
Key Features
- Flexible Metric Emission: Record scalar values, dictionaries of metrics, and status updates.
- Automatic Context Capture: Automatically captures system information (CPU, GPU, RAM) and user context when emitting telemetry.
- Pluggable Backends: Support for multiple storage formats to balance between performance (JSONL) and queryability (SQLite).
- Async Support: Designed to work within asynchronous workflows common in AI service implementations.
Usage Guide
Basic Telemetry Emission
To record a metric, instantiate the Telemetry class and use the emit method.
from cloudmesh.ai.common.telemetry import Telemetry
t = Telemetry()
# Emit a simple metric
t.emit(metric="inference_latency", value=0.125, status="completed")
# Emit multiple metrics at once
t.emit(metrics={"tokens_per_sec": 45.2, "gpu_util": 88}, status="completed")
Recording Events
Events are used to mark specific milestones in a process (e.g., "model_loaded", "request_received").
from cloudmesh.ai.common.telemetry import Telemetry
t = Telemetry()
t.emit(event="model_load_start", status="started")
# ... load model ...
t.emit(event="model_load_end", status="completed")
Configuring the Backend
You can specify where the telemetry data should be stored.
from cloudmesh.ai.common.telemetry import Telemetry
# Store telemetry in a SQLite database for easier querying
t = Telemetry(backend="sqlite", path="telemetry.db")
t.emit(metric="accuracy", value=0.92)
API Reference
Refer to the auto-generated API documentation for detailed method signatures and backend configuration options.