OpenTelemetry Ultimate Guide: With Python Demo Examples

Complete guide to implementing OpenTelemetry for tracing, monitoring, and logging in cloud-native applications with practical Python examples

Quick Navigation

Difficulty: 🟡 Intermediate
Estimated Time: 45-60 minutes
Prerequisites: Basic Python knowledge, Understanding of web APIs, Familiarity with Docker concepts

What You'll Learn

This tutorial covers essential OpenTelemetry concepts and tools:

  • OpenTelemetry Fundamentals - Understanding traces, spans, and context propagation
  • Python Implementation - Step-by-step FastAPI integration with OpenTelemetry
  • Distributed Tracing - How to track requests across multiple services
  • Metrics and Logging - Comprehensive observability implementation
  • Production Deployment - Best practices for production environments
  • Backend Integration - Connecting to various observability backends
  • Performance Optimization - Efficient telemetry collection and processing

Prerequisites

  • Basic Python knowledge and development experience
  • Understanding of web APIs and HTTP concepts
  • Familiarity with Docker concepts and containerization
  • Basic understanding of distributed systems and microservices

Introduction

In today's fast-paced world of microservices, cloud-native apps, and serverless architectures, observability is not a luxury — it's a necessity.

Say hello to OpenTelemetry (OTel) — your open-source, vendor-agnostic framework for collecting traces, metrics, and logs. Whether you're a developer, DevOps engineer, or SRE, this guide will walk you through OpenTelemetry's core features with clear Python examples and help you instrument your system like a pro.

What is OpenTelemetry?

OpenTelemetry is not a monitoring platform itself — it's a framework to generate telemetry data and send it to any backend (like Prometheus, Jaeger, Grafana, etc.).

Key Capabilities

  • Metrics - Performance and business metrics
  • Traces - Request flow across services
  • Logs - Application and system logs

Notable Advantages

  • No vendor lock-in - Send data anywhere
  • Unified APIs & SDKs across languages
  • Consistent context propagation & semantic conventions

OpenTelemetry Architecture

OpenTelemetry's architecture is modular, flexible, and designed to support both developers and operators:

Instrumentation Layer

This is where telemetry data is generated.

  • Use OpenTelemetry SDKs or auto-instrumentation agents
  • Generates traces, metrics, and logs from applications

Context Propagation

Ensures that trace context (trace ID, span ID) travels across services.

  • Based on the W3C TraceContext standard

OpenTelemetry Collector

A vendor-neutral service that processes, transforms, and exports telemetry.

Components:

  • Receivers: Collect data (e.g., OTLP, Jaeger, Zipkin)
  • Processors: Batch, sample, or transform data
  • Exporters: Send data to Prometheus, Jaeger, etc.

Deployment Modes:

  • Agent Mode: Sidecar to your app
  • Gateway Mode: Centralized collection point

Backends

The final destination for your telemetry.

You can use dedicated backends for each signal type depending on your use case and tooling preferences:

Traces

  • Jaeger — Distributed tracing UI
  • Zipkin — Lightweight trace visualization
  • Grafana Tempo — Trace storage with high scalability

Metrics

  • Prometheus — Most common time-series database for metrics
  • Graphite — Simpler, older alternative
  • InfluxDB — High-performance metrics storage with query support

Logs

  • ElasticSearch + Kibana (ELK Stack) — Full-text search and log visualization
  • Loki (by Grafana) — Logs with Prometheus-style labels
  • Fluentd / Fluent Bit — Log routers to various sinks

Unified Backends (All-in-One)

Some platforms support logs, metrics, and traces in a single integrated stack:

  • Datadog — Unified observability with excellent correlation features
  • New Relic — Full-stack monitoring for all three signal types
  • Grafana Cloud — Combines Prometheus (metrics), Loki (logs), and Tempo (traces)
  • Elastic Observability — Centralized observability built on the Elastic Stack

These all-in-one solutions offer convenience, correlation, and cost management benefits by reducing the need to manage separate tools for each signal type.

Zero-code vs Code-based Instrumentation

Zero-code Instrumentation

Zero-code instrumentation allows you to monitor applications without modifying the source code.

Useful for:

  • Developers who want to avoid touching production code
  • Fast onboarding with minimal config

Works by attaching an agent or using framework-specific hooks for:

  • HTTP libraries
  • Database clients
  • Messaging systems

How it works:

  • Automatically injects tracing/metrics logic into supported libraries
  • Controlled via environment variables (e.g. OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_ENDPOINT)
  • Supported Languages: Java, Python, JavaScript, .NET, Go, PHP

Ideal for getting started or observing third-party applications

Code-based Instrumentation

Code-based instrumentation gives you fine-grained control over what and how telemetry is collected.

Useful for:

  • Custom business logic tracking
  • Advanced correlation between telemetry signals

How it works:

  • Developers manually use OpenTelemetry SDKs to define spans, metrics, and logs

Key OpenTelemetry Concepts

  • Tracer / Meter Provider: Configures exporters & instruments
  • Context Propagation: Keeps trace & span IDs flowing across services
  • Semantic Conventions: Consistent naming like http.method, db.system, user_agent
  • Sampling: Reduce telemetry volume with head/tail sampling

What is a Trace?

A trace is a tree-like structure that tracks a single request as it moves across services. Each unit of work is a span.

What is a Span?

A span represents a single operation (e.g., an HTTP request, DB query). A trace is made up of multiple spans.

Each span has:

  • Trace ID — Unique per request (shared across all spans in that request)
  • Span ID — Unique per span (operation)
  • Parent Span ID — Links child spans to their parent

Python Example: FastAPI with OpenTelemetry

We use a clean, production-grade architecture of a REST API:

fastapi_user_manager/
├── main.py
├── app/
│   ├── api/           # Routes, dependencies
│   ├── core/          # Config, security, tracing
│   ├── crud/          # DB logic
│   ├── db/            # SQLAlchemy session & base
│   ├── models/        # SQLAlchemy models
│   ├── schemas/       # Pydantic validation
│   ├── services/      # Business logic
│   └── utils/         # Helpers, logging

Install OpenTelemetry Libraries

pip install \
    "opentelemetry-distro[otlp]" \
    opentelemetry-exporter-otlp \
    opentelemetry-instrumentation-fastapi \
    opentelemetry-instrumentation-sqlalchemy

Configure Tracer in core/otel.py

# app/core/otel.py
from opentelemetry import trace
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from app.db.session import engine

def setup_tracer(app):
    provider = TracerProvider(
        resource=Resource.create({SERVICE_NAME: "fastapi-user-service"})
    )
    trace.set_tracer_provider(provider)
    processor = BatchSpanProcessor(OTLPSpanExporter())
    provider.add_span_processor(processor)
    FastAPIInstrumentor.instrument_app(app)
    SQLAlchemyInstrumentor().instrument(engine=engine)

What this does:

  • Creates a global tracer with service name
  • Sends spans to an OTLP-compatible collector (e.g., Jaeger)
  • Auto-instruments FastAPI and SQLAlchemy

Enable Tracer in main.py

# main.py
from fastapi import FastAPI
from app.api.routes import router as api_router
from app.core.otel import setup_tracer

app = FastAPI(title="User Manager API")
setup_tracer(app)  # Tracing starts here
app.include_router(api_router)

Every HTTP request now gets a root parent span like:

  • Span Name: "POST /users/"
  • Span ID: a1b2c3
  • Trace ID: 1234abcd

Add Manual Spans for Business Logic

user_service.py

from sqlalchemy.orm import Session
from app.schemas.user import UserCreate
from app.crud.user import create_user
from opentelemetry import trace

tracer = trace.get_tracer(__name__)

def register_user(db: Session, user: UserCreate):
    with tracer.start_as_current_span("service.register_user") as span:
        span.set_attribute("user.email", user.email)
        return create_user(db, user)

user.py (CRUD)

from sqlalchemy.orm import Session
from app.models.user import User
from app.schemas.user import UserCreate
from app.core.security import get_password_hash
from opentelemetry import trace

tracer = trace.get_tracer(__name__)

def create_user(db: Session, user: UserCreate) -> User:
    with tracer.start_as_current_span("crud.create_user") as span:
        span.set_attribute("db.user.email", user.email)
        db_user = User(
            email=user.email, 
            hashed_password=get_password_hash(user.password)
        )
        db.add(db_user)
        db.commit()
        db.refresh(db_user)
        return db_user

How Parent-Child Span Relationship Works

Trace Context = Propagated Metadata

Each span contains metadata:

  • trace_id: ID for the entire request
  • span_id: ID of this specific span
  • parent_span_id: ID of the parent span

The parent-child link is created by propagating this context — in memory (in-process) or across services (via HTTP headers).

In-Process Example (FastAPI)

from opentelemetry import trace
tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("parent_span") as parent:
    with tracer.start_as_current_span("child_span") as child:
        # ... your code here

What Happens:

  • child_span automatically inherits the current context, which includes the parent span
  • OpenTelemetry stores the current span in thread-local storage (or async-local context)
  • No need to manually pass span IDs

Cross-Service (Distributed) Example

In microservices, the trace context is injected into HTTP headers using W3C Trace Context format:

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01

When another service receives this HTTP request:

  • It extracts the context from headers
  • It sets this context as current
  • New spans will automatically inherit the trace_id and set parent_span_id accordingly

FastAPI + OpenTelemetry Example

# routes.py
@router.post("/users/")
def create_user(user: UserCreate):
    # Auto-created parent span: "POST /users/"
    return register_user(db, user)

# user_service.py
with tracer.start_as_current_span("service.register_user") as span:
    # This span gets its parent from the current context set by FastAPIInstrumentor
    # ...

Here, service.register_user becomes a child of the HTTP span (POST /users/), because the tracer reads the current active span from the context.

Summary: How a Span Gets Its Parent

SituationParent Assignment Method
In same service/threadImplicit from current context (start_as_current_span)
Across services (HTTP)Extracted from incoming headers
Manually createdYou can pass parent=some_span_context explicitly

Manual Parent Assignment (optional)

with tracer.start_span("child", context=trace.set_span_in_context(parent_span)):
    # ... your code here

This is useful when:

  • You receive span context manually
  • You're not using the default async/thread context propagation

Real Trace Tree

Trace ID: 123456abcd
└── Span: POST /users/            [FastAPI route] (root)
    ├── Span: service.register_user
    │   └── Span: crud.create_user
    │       └── Span: SQL INSERT INTO users

This lets you see exactly where a bottleneck or error occurred.

Why This Setup is Production-Ready

Python Metrics Example

from opentelemetry.metrics import get_meter_provider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server
import time

reader = PrometheusMetricReader()
provider = MeterProvider(metric_readers=[reader])
meter = provider.get_meter("order-service")

order_counter = meter.create_counter(
    name="orders_placed",
    unit="1",
    description="Number of orders placed"
)

start_http_server(port=8000)

while True:
    order_counter.add(1, {"service": "checkout"})
    print("Order placed")
    time.sleep(5)

Visit http://localhost:8000/metrics to see your Prometheus metrics.

Python Logging Example

import logging
from opentelemetry.instrumentation.logging import LoggingInstrumentor

LoggingInstrumentor().instrument(set_logging_format=True)
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

logger.info("Order checkout started")
logger.warning("Inventory low")
logger.error("Payment declined")

Conclusion

OpenTelemetry is the universal toolkit for observability across any language, platform, or service. With full support for tracing, metrics, and logs, it brings consistency, visibility, and power to your debugging and monitoring workflows.

Don't wait until production fires — instrument your services today and gain the clarity your systems deserve!

Key Takeaways

  • Universal Framework - Vendor-agnostic observability across all platforms
  • Comprehensive Coverage - Traces, metrics, and logs in one toolkit
  • Easy Integration - Simple setup with FastAPI and other frameworks
  • Production Ready - Built for enterprise-scale deployments
  • Future Proof - Industry standard with growing ecosystem

Next Steps

  1. Instrument your FastAPI services with OpenTelemetry
  2. Set up a collector to process and export telemetry data
  3. Connect to backends like Jaeger, Prometheus, or Grafana
  4. Implement distributed tracing across your microservices
  5. Add custom metrics and logs for business observability

Tags: #OpenTelemetry #Observability #PythonTracing #CloudNative #DistributedSystems #DevOps