AI Application Observability

Introduction

With the rise of AI applications, it’s important to have observability in place. Especially when control flow is not as clear as traditional applications. This post will cover basic logs, metrics, traces using OpenTelemetry. Continuous profiling using Pyroscope. LLM observability using LangSmith.

OpenTelemetry

OpenTelemetry is a collection of APIs, SDKs, and tools. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software’s performance and behavior.

The usual flow is to add OpenTelemetry SDK to application, then it can export telemetry data to various backends, like Prometheus, Tempo, or an OTEL collector.

Some SDK supports automatic instrumentation, for example in Python, you can add

pip install opentelemetry-distro[otlp] opentelemetry-instrumentation

Then run

opentelemetry-bootstrap -a requirements

It will list all the packages that need to be installed to instrument your application. Note that some instrumentation may crash the app due to bugs like this, cherry-pick the ones you need.

Then all options can be configured via environment variables, official doc.

Take the previous chat search example, entrypoint command changed to

opentelemetry-instrument uvicorn app.server:app --host 0.0.0.0 --port 8000

start local development with

docker compose up --build

Then ask a question in chat playground and explore tempo traces in grafana.

We can inspect redis query and request to LLM model.

If export to grafana cloud, there is a default application dashboard to view them all.

Pyroscope

Sometimes the performance issue is not in the network or external services, but in the code itself. During development we can use local profiling tools to diagnose them, while in production, it’s not easy to attach the profiler to the running process, thus we need continuous profiling.

Pyroscope is a continuous profiling tool. It can be used to locate performance issues down to the line of code. For python, some config in code is needed.

pip install pyroscope-io

import pyroscope

pyroscope.configure(
    application_name="app_name",
    server_address="http://pyroscope:4040",
)

Then the flame graph can be viewed in pyroscope UI.

LangSmith

An unique part of AI application observability is LLM observability. For example in LangChain chain, the response of model can be used to do many things, can be directly shown to user, or be the input next step, or even be used to decide the next step in the control flow. Thus monitoring the full lifecycle of a chain is very important.

LangSmith is a platform for LLM observability. It has built-in support for LangChain, can be used to monitor all interactions with models.

To enable LangSmith, just set environment variables

export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=<your-api-key>

Then check the LangSmith UI for traces. Here is a public demo for my blog chatbot.

Then click View LangSmith trace to see the trace.

Conclusion

AI application observability is not easy, but with several tools like OpenTelemetry, Pyroscope, LangSmith combined, we can have multi-dimension views of the application.