OpenTelemetry Ultimate Guide: With Python Demo Examples
Complete guide to implementing OpenTelemetry for tracing, monitoring, and logging in cloud-native applications with practical Python examples
Quick Navigation
Difficulty: 🟡 Intermediate
Estimated Time: 45-60 minutes
Prerequisites: Basic Python knowledge, Understanding of web APIs, Familiarity with Docker concepts
What You'll Learn
This tutorial covers essential OpenTelemetry concepts and tools:
- OpenTelemetry Fundamentals - Understanding traces, spans, and context propagation
- Python Implementation - Step-by-step FastAPI integration with OpenTelemetry
- Distributed Tracing - How to track requests across multiple services
- Metrics and Logging - Comprehensive observability implementation
- Production Deployment - Best practices for production environments
- Backend Integration - Connecting to various observability backends
- Performance Optimization - Efficient telemetry collection and processing
Prerequisites
- Basic Python knowledge and development experience
- Understanding of web APIs and HTTP concepts
- Familiarity with Docker concepts and containerization
- Basic understanding of distributed systems and microservices
Related Tutorials
- Docker Compose for Development - Multi-container development setup
- Kubernetes HPA Autoscaling - Kubernetes autoscaling with metrics
- Main Tutorials Hub - Step-by-step implementation guides
Introduction
In today's fast-paced world of microservices, cloud-native apps, and serverless architectures, observability is not a luxury — it's a necessity.
Say hello to OpenTelemetry (OTel) — your open-source, vendor-agnostic framework for collecting traces, metrics, and logs. Whether you're a developer, DevOps engineer, or SRE, this guide will walk you through OpenTelemetry's core features with clear Python examples and help you instrument your system like a pro.
What is OpenTelemetry?
OpenTelemetry is not a monitoring platform itself — it's a framework to generate telemetry data and send it to any backend (like Prometheus, Jaeger, Grafana, etc.).
Key Capabilities
- Metrics - Performance and business metrics
- Traces - Request flow across services
- Logs - Application and system logs
Notable Advantages
- No vendor lock-in - Send data anywhere
- Unified APIs & SDKs across languages
- Consistent context propagation & semantic conventions
OpenTelemetry Architecture
OpenTelemetry's architecture is modular, flexible, and designed to support both developers and operators:
Instrumentation Layer
This is where telemetry data is generated.
- Use OpenTelemetry SDKs or auto-instrumentation agents
- Generates traces, metrics, and logs from applications
Context Propagation
Ensures that trace context (trace ID, span ID) travels across services.
- Based on the W3C TraceContext standard
OpenTelemetry Collector
A vendor-neutral service that processes, transforms, and exports telemetry.
Components:
- Receivers: Collect data (e.g., OTLP, Jaeger, Zipkin)
- Processors: Batch, sample, or transform data
- Exporters: Send data to Prometheus, Jaeger, etc.
Deployment Modes:
- Agent Mode: Sidecar to your app
- Gateway Mode: Centralized collection point
Backends
The final destination for your telemetry.
You can use dedicated backends for each signal type depending on your use case and tooling preferences:
Traces
- Jaeger — Distributed tracing UI
- Zipkin — Lightweight trace visualization
- Grafana Tempo — Trace storage with high scalability
Metrics
- Prometheus — Most common time-series database for metrics
- Graphite — Simpler, older alternative
- InfluxDB — High-performance metrics storage with query support
Logs
- ElasticSearch + Kibana (ELK Stack) — Full-text search and log visualization
- Loki (by Grafana) — Logs with Prometheus-style labels
- Fluentd / Fluent Bit — Log routers to various sinks
Unified Backends (All-in-One)
Some platforms support logs, metrics, and traces in a single integrated stack:
- Datadog — Unified observability with excellent correlation features
- New Relic — Full-stack monitoring for all three signal types
- Grafana Cloud — Combines Prometheus (metrics), Loki (logs), and Tempo (traces)
- Elastic Observability — Centralized observability built on the Elastic Stack
These all-in-one solutions offer convenience, correlation, and cost management benefits by reducing the need to manage separate tools for each signal type.
Zero-code vs Code-based Instrumentation
Zero-code Instrumentation
Zero-code instrumentation allows you to monitor applications without modifying the source code.
Useful for:
- Developers who want to avoid touching production code
- Fast onboarding with minimal config
Works by attaching an agent or using framework-specific hooks for:
- HTTP libraries
- Database clients
- Messaging systems
How it works:
- Automatically injects tracing/metrics logic into supported libraries
- Controlled via environment variables (e.g.
OTEL_SERVICE_NAME
,OTEL_EXPORTER_OTLP_ENDPOINT
) - Supported Languages: Java, Python, JavaScript, .NET, Go, PHP
Ideal for getting started or observing third-party applications
Code-based Instrumentation
Code-based instrumentation gives you fine-grained control over what and how telemetry is collected.
Useful for:
- Custom business logic tracking
- Advanced correlation between telemetry signals
How it works:
- Developers manually use OpenTelemetry SDKs to define spans, metrics, and logs
Key OpenTelemetry Concepts
- Tracer / Meter Provider: Configures exporters & instruments
- Context Propagation: Keeps trace & span IDs flowing across services
- Semantic Conventions: Consistent naming like
http.method
,db.system
,user_agent
- Sampling: Reduce telemetry volume with head/tail sampling
What is a Trace?
A trace is a tree-like structure that tracks a single request as it moves across services. Each unit of work is a span.
What is a Span?
A span represents a single operation (e.g., an HTTP request, DB query). A trace is made up of multiple spans.
Each span has:
- Trace ID — Unique per request (shared across all spans in that request)
- Span ID — Unique per span (operation)
- Parent Span ID — Links child spans to their parent
Python Example: FastAPI with OpenTelemetry
We use a clean, production-grade architecture of a REST API:
fastapi_user_manager/
├── main.py
├── app/
│ ├── api/ # Routes, dependencies
│ ├── core/ # Config, security, tracing
│ ├── crud/ # DB logic
│ ├── db/ # SQLAlchemy session & base
│ ├── models/ # SQLAlchemy models
│ ├── schemas/ # Pydantic validation
│ ├── services/ # Business logic
│ └── utils/ # Helpers, logging
Install OpenTelemetry Libraries
pip install \
"opentelemetry-distro[otlp]" \
opentelemetry-exporter-otlp \
opentelemetry-instrumentation-fastapi \
opentelemetry-instrumentation-sqlalchemy
Configure Tracer in core/otel.py
# app/core/otel.py
from opentelemetry import trace
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from app.db.session import engine
def setup_tracer(app):
provider = TracerProvider(
resource=Resource.create({SERVICE_NAME: "fastapi-user-service"})
)
trace.set_tracer_provider(provider)
processor = BatchSpanProcessor(OTLPSpanExporter())
provider.add_span_processor(processor)
FastAPIInstrumentor.instrument_app(app)
SQLAlchemyInstrumentor().instrument(engine=engine)
What this does:
- Creates a global tracer with service name
- Sends spans to an OTLP-compatible collector (e.g., Jaeger)
- Auto-instruments FastAPI and SQLAlchemy
Enable Tracer in main.py
# main.py
from fastapi import FastAPI
from app.api.routes import router as api_router
from app.core.otel import setup_tracer
app = FastAPI(title="User Manager API")
setup_tracer(app) # Tracing starts here
app.include_router(api_router)
Every HTTP request now gets a root parent span like:
- Span Name: "POST /users/"
- Span ID: a1b2c3
- Trace ID: 1234abcd
Add Manual Spans for Business Logic
user_service.py
from sqlalchemy.orm import Session
from app.schemas.user import UserCreate
from app.crud.user import create_user
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
def register_user(db: Session, user: UserCreate):
with tracer.start_as_current_span("service.register_user") as span:
span.set_attribute("user.email", user.email)
return create_user(db, user)
user.py
(CRUD)
from sqlalchemy.orm import Session
from app.models.user import User
from app.schemas.user import UserCreate
from app.core.security import get_password_hash
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
def create_user(db: Session, user: UserCreate) -> User:
with tracer.start_as_current_span("crud.create_user") as span:
span.set_attribute("db.user.email", user.email)
db_user = User(
email=user.email,
hashed_password=get_password_hash(user.password)
)
db.add(db_user)
db.commit()
db.refresh(db_user)
return db_user
How Parent-Child Span Relationship Works
Trace Context = Propagated Metadata
Each span contains metadata:
trace_id
: ID for the entire requestspan_id
: ID of this specific spanparent_span_id
: ID of the parent span
The parent-child link is created by propagating this context — in memory (in-process) or across services (via HTTP headers).
In-Process Example (FastAPI)
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("parent_span") as parent:
with tracer.start_as_current_span("child_span") as child:
# ... your code here
What Happens:
child_span
automatically inherits the current context, which includes the parent span- OpenTelemetry stores the current span in thread-local storage (or async-local context)
- No need to manually pass span IDs
Cross-Service (Distributed) Example
In microservices, the trace context is injected into HTTP headers using W3C Trace Context format:
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
When another service receives this HTTP request:
- It extracts the context from headers
- It sets this context as current
- New spans will automatically inherit the
trace_id
and setparent_span_id
accordingly
FastAPI + OpenTelemetry Example
# routes.py
@router.post("/users/")
def create_user(user: UserCreate):
# Auto-created parent span: "POST /users/"
return register_user(db, user)
# user_service.py
with tracer.start_as_current_span("service.register_user") as span:
# This span gets its parent from the current context set by FastAPIInstrumentor
# ...
Here, service.register_user
becomes a child of the HTTP span (POST /users/
), because the tracer reads the current active span from the context.
Summary: How a Span Gets Its Parent
Situation | Parent Assignment Method |
---|---|
In same service/thread | Implicit from current context (start_as_current_span ) |
Across services (HTTP) | Extracted from incoming headers |
Manually created | You can pass parent=some_span_context explicitly |
Manual Parent Assignment (optional)
with tracer.start_span("child", context=trace.set_span_in_context(parent_span)):
# ... your code here
This is useful when:
- You receive span context manually
- You're not using the default async/thread context propagation
Real Trace Tree
Trace ID: 123456abcd
└── Span: POST /users/ [FastAPI route] (root)
├── Span: service.register_user
│ └── Span: crud.create_user
│ └── Span: SQL INSERT INTO users
This lets you see exactly where a bottleneck or error occurred.
Why This Setup is Production-Ready
Python Metrics Example
from opentelemetry.metrics import get_meter_provider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server
import time
reader = PrometheusMetricReader()
provider = MeterProvider(metric_readers=[reader])
meter = provider.get_meter("order-service")
order_counter = meter.create_counter(
name="orders_placed",
unit="1",
description="Number of orders placed"
)
start_http_server(port=8000)
while True:
order_counter.add(1, {"service": "checkout"})
print("Order placed")
time.sleep(5)
Visit http://localhost:8000/metrics
to see your Prometheus metrics.
Python Logging Example
import logging
from opentelemetry.instrumentation.logging import LoggingInstrumentor
LoggingInstrumentor().instrument(set_logging_format=True)
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
logger.info("Order checkout started")
logger.warning("Inventory low")
logger.error("Payment declined")
Conclusion
OpenTelemetry is the universal toolkit for observability across any language, platform, or service. With full support for tracing, metrics, and logs, it brings consistency, visibility, and power to your debugging and monitoring workflows.
Don't wait until production fires — instrument your services today and gain the clarity your systems deserve!
Key Takeaways
- Universal Framework - Vendor-agnostic observability across all platforms
- Comprehensive Coverage - Traces, metrics, and logs in one toolkit
- Easy Integration - Simple setup with FastAPI and other frameworks
- Production Ready - Built for enterprise-scale deployments
- Future Proof - Industry standard with growing ecosystem
Next Steps
- Instrument your FastAPI services with OpenTelemetry
- Set up a collector to process and export telemetry data
- Connect to backends like Jaeger, Prometheus, or Grafana
- Implement distributed tracing across your microservices
- Add custom metrics and logs for business observability
Tags: #OpenTelemetry #Observability #PythonTracing #CloudNative #DistributedSystems #DevOps