OpenTelemetry Ultimate Guide: With Python Demo Examples

Complete guide to implementing OpenTelemetry for tracing, monitoring, and logging in cloud-native applications with practical Python examples

7-11 minutes(2183 words)simple

Difficulty: 🟡 Intermediate
Estimated Time: 45-60 minutes
Prerequisites: Basic Python knowledge, Understanding of web APIs, Familiarity with Docker concepts

What You'll Learn

This tutorial covers essential OpenTelemetry concepts and tools:

OpenTelemetry Fundamentals - Understanding traces, spans, and context propagation
Python Implementation - Step-by-step FastAPI integration with OpenTelemetry
Distributed Tracing - How to track requests across multiple services
Metrics and Logging - Comprehensive observability implementation
Production Deployment - Best practices for production environments
Backend Integration - Connecting to various observability backends
Performance Optimization - Efficient telemetry collection and processing

Prerequisites

Basic Python knowledge and development experience
Understanding of web APIs and HTTP concepts
Familiarity with Docker concepts and containerization
Basic understanding of distributed systems and microservices

Docker Compose for Development - Multi-container development setup
Kubernetes HPA Autoscaling - Kubernetes autoscaling with metrics
Main Tutorials Hub - Step-by-step implementation guides

Introduction

In today's fast-paced world of microservices, cloud-native apps, and serverless architectures, observability is not a luxury — it's a necessity.

Say hello to OpenTelemetry (OTel) — your open-source, vendor-agnostic framework for collecting traces, metrics, and logs. Whether you're a developer, DevOps engineer, or SRE, this guide will walk you through OpenTelemetry's core features with clear Python examples and help you instrument your system like a pro.

What is OpenTelemetry?

OpenTelemetry is not a monitoring platform itself — it's a framework to generate telemetry data and send it to any backend (like Prometheus, Jaeger, Grafana, etc.).

Key Capabilities

Metrics - Performance and business metrics
Traces - Request flow across services
Logs - Application and system logs

Notable Advantages

No vendor lock-in - Send data anywhere
Unified APIs & SDKs across languages
Consistent context propagation & semantic conventions

OpenTelemetry Architecture

OpenTelemetry's architecture is modular, flexible, and designed to support both developers and operators:

Instrumentation Layer

This is where telemetry data is generated.

Use OpenTelemetry SDKs or auto-instrumentation agents
Generates traces, metrics, and logs from applications

Context Propagation

Ensures that trace context (trace ID, span ID) travels across services.

Based on the W3C TraceContext standard

OpenTelemetry Collector

A vendor-neutral service that processes, transforms, and exports telemetry.

Components:

Receivers: Collect data (e.g., OTLP, Jaeger, Zipkin)
Processors: Batch, sample, or transform data
Exporters: Send data to Prometheus, Jaeger, etc.

Deployment Modes:

Agent Mode: Sidecar to your app
Gateway Mode: Centralized collection point

Backends

The final destination for your telemetry.

You can use dedicated backends for each signal type depending on your use case and tooling preferences:

Traces

Jaeger — Distributed tracing UI
Zipkin — Lightweight trace visualization
Grafana Tempo — Trace storage with high scalability

Metrics

Prometheus — Most common time-series database for metrics
Graphite — Simpler, older alternative
InfluxDB — High-performance metrics storage with query support

Logs

ElasticSearch + Kibana (ELK Stack) — Full-text search and log visualization
Loki (by Grafana) — Logs with Prometheus-style labels
Fluentd / Fluent Bit — Log routers to various sinks

Unified Backends (All-in-One)

Some platforms support logs, metrics, and traces in a single integrated stack:

Datadog — Unified observability with excellent correlation features
New Relic — Full-stack monitoring for all three signal types
Grafana Cloud — Combines Prometheus (metrics), Loki (logs), and Tempo (traces)
Elastic Observability — Centralized observability built on the Elastic Stack

These all-in-one solutions offer convenience, correlation, and cost management benefits by reducing the need to manage separate tools for each signal type.

Zero-code vs Code-based Instrumentation

Zero-code Instrumentation

Zero-code instrumentation allows you to monitor applications without modifying the source code.

Useful for:

Developers who want to avoid touching production code
Fast onboarding with minimal config

Works by attaching an agent or using framework-specific hooks for:

HTTP libraries
Database clients
Messaging systems

How it works:

Automatically injects tracing/metrics logic into supported libraries
Controlled via environment variables (e.g. OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_ENDPOINT)
Supported Languages: Java, Python, JavaScript, .NET, Go, PHP

Ideal for getting started or observing third-party applications

Code-based Instrumentation

Code-based instrumentation gives you fine-grained control over what and how telemetry is collected.

Useful for:

Custom business logic tracking
Advanced correlation between telemetry signals

How it works:

Developers manually use OpenTelemetry SDKs to define spans, metrics, and logs

Key OpenTelemetry Concepts

Tracer / Meter Provider: Configures exporters & instruments
Context Propagation: Keeps trace & span IDs flowing across services
Semantic Conventions: Consistent naming like http.method, db.system, user_agent
Sampling: Reduce telemetry volume with head/tail sampling

What is a Trace?

A trace is a tree-like structure that tracks a single request as it moves across services. Each unit of work is a span.

What is a Span?

A span represents a single operation (e.g., an HTTP request, DB query). A trace is made up of multiple spans.

Each span has:

Trace ID — Unique per request (shared across all spans in that request)
Span ID — Unique per span (operation)
Parent Span ID — Links child spans to their parent

Python Example: FastAPI with OpenTelemetry

We use a clean, production-grade architecture of a REST API:

fastapi_user_manager/
├── main.py
├── app/
│   ├── api/           # Routes, dependencies
│   ├── core/          # Config, security, tracing
│   ├── crud/          # DB logic
│   ├── db/            # SQLAlchemy session & base
│   ├── models/        # SQLAlchemy models
│   ├── schemas/       # Pydantic validation
│   ├── services/      # Business logic
│   └── utils/         # Helpers, logging

Install OpenTelemetry Libraries

pip install \
    "opentelemetry-distro[otlp]" \
    opentelemetry-exporter-otlp \
    opentelemetry-instrumentation-fastapi \
    opentelemetry-instrumentation-sqlalchemy

Configure Tracer in `core/otel.py`

# app/core/otel.py
from opentelemetry import trace
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from app.db.session import engine

def setup_tracer(app):
    provider = TracerProvider(
        resource=Resource.create({SERVICE_NAME: "fastapi-user-service"})
    )
    trace.set_tracer_provider(provider)
    processor = BatchSpanProcessor(OTLPSpanExporter())
    provider.add_span_processor(processor)
    FastAPIInstrumentor.instrument_app(app)
    SQLAlchemyInstrumentor().instrument(engine=engine)

What this does:

Creates a global tracer with service name
Sends spans to an OTLP-compatible collector (e.g., Jaeger)
Auto-instruments FastAPI and SQLAlchemy

Enable Tracer in `main.py`

# main.py
from fastapi import FastAPI
from app.api.routes import router as api_router
from app.core.otel import setup_tracer

app = FastAPI(title="User Manager API")
setup_tracer(app)  # Tracing starts here
app.include_router(api_router)

Every HTTP request now gets a root parent span like:

Span Name: "POST /users/"
Span ID: a1b2c3
Trace ID: 1234abcd

Add Manual Spans for Business Logic

user_service.py

from sqlalchemy.orm import Session
from app.schemas.user import UserCreate
from app.crud.user import create_user
from opentelemetry import trace

tracer = trace.get_tracer(__name__)

def register_user(db: Session, user: UserCreate):
    with tracer.start_as_current_span("service.register_user") as span:
        span.set_attribute("user.email", user.email)
        return create_user(db, user)

user.py (CRUD)

from sqlalchemy.orm import Session
from app.models.user import User
from app.schemas.user import UserCreate
from app.core.security import get_password_hash
from opentelemetry import trace

tracer = trace.get_tracer(__name__)

def create_user(db: Session, user: UserCreate) -> User:
    with tracer.start_as_current_span("crud.create_user") as span:
        span.set_attribute("db.user.email", user.email)
        db_user = User(
            email=user.email, 
            hashed_password=get_password_hash(user.password)
        )
        db.add(db_user)
        db.commit()
        db.refresh(db_user)
        return db_user

How Parent-Child Span Relationship Works

Trace Context = Propagated Metadata

Each span contains metadata:

trace_id: ID for the entire request
span_id: ID of this specific span
parent_span_id: ID of the parent span

The parent-child link is created by propagating this context — in memory (in-process) or across services (via HTTP headers).

In-Process Example (FastAPI)

from opentelemetry import trace
tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("parent_span") as parent:
    with tracer.start_as_current_span("child_span") as child:
        # ... your code here

What Happens:

child_span automatically inherits the current context, which includes the parent span
OpenTelemetry stores the current span in thread-local storage (or async-local context)
No need to manually pass span IDs

Cross-Service (Distributed) Example

In microservices, the trace context is injected into HTTP headers using W3C Trace Context format:

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01

When another service receives this HTTP request:

It extracts the context from headers
It sets this context as current
New spans will automatically inherit the trace_id and set parent_span_id accordingly

FastAPI + OpenTelemetry Example

# routes.py
@router.post("/users/")
def create_user(user: UserCreate):
    # Auto-created parent span: "POST /users/"
    return register_user(db, user)

# user_service.py
with tracer.start_as_current_span("service.register_user") as span:
    # This span gets its parent from the current context set by FastAPIInstrumentor
    # ...

Here, service.register_user becomes a child of the HTTP span (POST /users/), because the tracer reads the current active span from the context.

Summary: How a Span Gets Its Parent

Situation	Parent Assignment Method
In same service/thread	Implicit from current context (`start_as_current_span`)
Across services (HTTP)	Extracted from incoming headers
Manually created	You can pass `parent=some_span_context` explicitly

Manual Parent Assignment (optional)

with tracer.start_span("child", context=trace.set_span_in_context(parent_span)):
    # ... your code here

This is useful when:

You receive span context manually
You're not using the default async/thread context propagation

Real Trace Tree

Trace ID: 123456abcd
└── Span: POST /users/            [FastAPI route] (root)
    ├── Span: service.register_user
    │   └── Span: crud.create_user
    │       └── Span: SQL INSERT INTO users

This lets you see exactly where a bottleneck or error occurred.

Why This Setup is Production-Ready

Python Metrics Example

from opentelemetry.metrics import get_meter_provider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server
import time

reader = PrometheusMetricReader()
provider = MeterProvider(metric_readers=[reader])
meter = provider.get_meter("order-service")

order_counter = meter.create_counter(
    name="orders_placed",
    unit="1",
    description="Number of orders placed"
)

start_http_server(port=8000)

while True:
    order_counter.add(1, {"service": "checkout"})
    print("Order placed")
    time.sleep(5)

Visit http://localhost:8000/metrics to see your Prometheus metrics.

Python Logging Example

import logging
from opentelemetry.instrumentation.logging import LoggingInstrumentor

LoggingInstrumentor().instrument(set_logging_format=True)
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

logger.info("Order checkout started")
logger.warning("Inventory low")
logger.error("Payment declined")

Conclusion

OpenTelemetry is the universal toolkit for observability across any language, platform, or service. With full support for tracing, metrics, and logs, it brings consistency, visibility, and power to your debugging and monitoring workflows.

Don't wait until production fires — instrument your services today and gain the clarity your systems deserve!

Key Takeaways

Universal Framework - Vendor-agnostic observability across all platforms
Comprehensive Coverage - Traces, metrics, and logs in one toolkit
Easy Integration - Simple setup with FastAPI and other frameworks
Production Ready - Built for enterprise-scale deployments
Future Proof - Industry standard with growing ecosystem

Next Steps

Instrument your FastAPI services with OpenTelemetry
Set up a collector to process and export telemetry data
Connect to backends like Jaeger, Prometheus, or Grafana
Implement distributed tracing across your microservices
Add custom metrics and logs for business observability

Tags: #OpenTelemetry #Observability #PythonTracing #CloudNative #DistributedSystems #DevOps

Observability

Ansible