GenAI Observability · Reference

Four schemas, ~90% the same problem

Side-by-side comparison of OpenTelemetry Semantic Conventions, OpenInference, OpenLLMetry, and Langfuse — how each models LLM calls, agents, tools, and metrics, and where the ecosystem is converging.

OTel SemConv v1.41.0 OpenInference (Arize) OpenLLMetry (Traceloop) Langfuse

Why four standards exist

Each project optimizes for a different audience. OTel optimizes for vendor neutrality, OpenInference for evaluation tooling, OpenLLMetry for fast pragmatic instrumentation, Langfuse for a single product UX. Convergence on OTel is now real but uneven.

OpenTelemetry SemConv

Backed by CNCF · v1.41.0 (Apr 2026)
The neutral standard. Slow-moving by committee, but every other project is now converging on it. Defines spans (chat, embeddings, invoke_agent, invoke_workflow, execute_tool), metrics, and events. Status: Development.
Strength: backend-agnostic. Instrument once, swap vendors freely.

OpenInference

Arize · OSS
Schema-first. The openinference.span.kind attribute is required on every span and drives a strict taxonomy: LLM, EMBEDDING, CHAIN, RETRIEVER, TOOL, AGENT, RERANKER, GUARDRAIL, EVALUATOR, PROMPT. Built around tracing for evaluation pipelines (Phoenix, Arize AX).
Strength: opinionated taxonomy makes evals & dashboards cleaner.

OpenLLMetry

Traceloop (acq. ServiceNow, Mar 2026) · OSS
OTel-native extension. Ships drop-in instrumentations for 30+ LLM providers, frameworks (LangChain, LlamaIndex, Haystack), and vector DBs (Pinecone, Qdrant, Weaviate, Chroma…). Adds attributes upstream OTel hasn't standardized yet, then deprecates them as OTel catches up.
Strength: fastest path from pip install to traces.

Langfuse

Langfuse · OSS + Cloud
Product-driven data model. Trace → Observation tree where each observation is a SPAN, GENERATION, or EVENT (plus extended types: Agent, Tool, Chain, Retriever, Evaluator, Embedding, Guardrail). Accepts OTel data and maps it onto this model for the Langfuse UI.
Strength: rich product UX (sessions, users, evals, prompt mgmt).

The 30-second mental model

If you only remember one thing per project:

OTel · operation-name driven
Span name = {operation} {target}. Example: chat gpt-4o, invoke_agent customer_support, execute_tool get_weather.
OpenInference · span-kind driven
Every span carries openinference.span.kind = LLM | AGENT | TOOL | RETRIEVER | …. Backend uses this to decide rendering and aggregation.
OpenLLMetry · OTel + provider extras
Mostly aligns with OTel, but adds Traceloop-specific attrs (traceloop.workflow.name, traceloop.entity.path) for nested orchestration.
Langfuse · observation-type driven
Each observation has a type field. The UI groups GENERATION observations into "model calls" and rolls up cost/tokens to the trace.

Span types & operation names

The biggest divergence between schemas is how spans are named and categorized. OTel uses an operation-name attribute, OpenInference uses a required span-kind, Langfuse uses observation types, OpenLLMetry follows OTel.

OpenTelemetry — gen_ai.operation.name

Span kind: CLIENT (external) or INTERNAL (in-process)
chat text_completion embeddings generate_content create_agent invoke_agent invoke_workflow execute_tool retrieval

Span name format: {operation_name} {model_or_target} — e.g., chat gpt-4o, invoke_agent triage_agent.

OpenInference — openinference.span.kind (required)

Strict taxonomy, backend-aware
LLM EMBEDDING CHAIN RETRIEVER TOOL AGENT RERANKER GUARDRAIL EVALUATOR PROMPT

CHAIN is unique to OpenInference: it represents glue code linking other spans (the "request → retrieve → LLM → respond" wrapper). PROMPT captures prompt template rendering.

OpenLLMetry — follows OTel + Traceloop extensions

Built atop gen_ai.*, adds traceloop.*
chat embeddings execute_tool traceloop.workflow traceloop.task traceloop.agent traceloop.tool db.* (vector DBs)

Vector DB calls follow OTel db.* conventions plus Traceloop additions (db.vector.query.top_k, etc.).

Langfuse — observation type

Trace → tree of Observations
SPAN GENERATION EVENT AGENT * TOOL * CHAIN * RETRIEVER * EMBEDDING * EVALUATOR * GUARDRAIL *

* Extended types added Aug 2025 — semantically richer rendering, but the core data model is still the SPAN/GENERATION/EVENT triple.

Worked example: a RAG agent answering a question

Same workload, four different shapes.

OTel
# root
span: "invoke_agent rag_qa"
  gen_ai.operation.name=invoke_agent
  span.kind=CLIENT
  ├─ "embeddings text-embedding-3"
  │     gen_ai.operation.name=embeddings
  ├─ "db.search vector_index"  # OTel db.*
  ├─ "chat gpt-4o"
  │     gen_ai.operation.name=chat
  └─ "execute_tool get_doc"
        gen_ai.operation.name=execute_tool
OpenInference
# root
span: "agent.run"
  openinference.span.kind=AGENT
  ├─ kind=EMBEDDING
  │     embedding.embeddings.0.embedding.text=…
  ├─ kind=RETRIEVER
  │     retrieval.documents.0.document.id=…
  ├─ kind=LLM
  │     llm.input_messages.0.message.role=user
  └─ kind=TOOL
        tool.name=get_doc
OpenLLMetry
# root
span: "rag_qa.workflow"
  traceloop.workflow.name=rag_qa
  ├─ "openai.embeddings"
  │     gen_ai.system=openai
  ├─ "pinecone.query"
  │     db.system=pinecone
  ├─ "openai.chat"
  │     gen_ai.operation.name=chat
  └─ "tool.get_doc"
        traceloop.entity.name=get_doc
Langfuse
# trace (top-level)
trace: "rag_qa"  user_id=…  session_id=…
  ├─ obs type=EMBEDDING
  │     model=text-embedding-3
  ├─ obs type=RETRIEVER
  │     input="…question…"
  ├─ obs type=GENERATION
  │     model=gpt-4o
  │     usage.input/output/total
  └─ obs type=TOOL
        name=get_doc

Attribute reference

The same concept, four different attribute names. Filter by concept or toggle columns to focus on the schemas you care about. Attributes flagged v1.41 are new in OTel SemConv v1.41.0.

Concept OTel SemConv OpenInference OpenLLMetry Langfuse
Identity & routing
Model namegen_ai.request.modelllm.model_namegen_ai.request.modelmodel (on GENERATION)
Provider / systemgen_ai.system (openai, anthropic, …)llm.providergen_ai.systemmodel.provider (mapped)
Operationgen_ai.operation.nameopeninference.span.kindgen_ai.operation.nameobservation type
Response modelgen_ai.response.modelllm.model_namegen_ai.response.modelcompletion_start_time meta
Response IDgen_ai.response.idllm.output_messages.{i}.idgen_ai.response.idid on observation
Messages & content
System instructionsgen_ai.system_instructions v1.41llm.input_messages.{i}.message.role=systemgen_ai.system_instructionspart of input
Input messagesgen_ai.input.messages v1.41llm.input_messages.{i}.message.role/contentgen_ai.input.messagesinput (JSON)
Output messagesgen_ai.output.messages v1.41llm.output_messages.{i}.message.role/contentgen_ai.output.messagesoutput (JSON)
Reasoning contentmessage part type=reasoning v1.41via OTelin output JSON
Finish reasongen_ai.response.finish_reasonsllm.output_messages.{i}.message.finish_reasongen_ai.response.finish_reasonsin metadata
Token usage
Input (prompt) tokensgen_ai.usage.input_tokensllm.token_count.promptgen_ai.usage.input_tokensusage.input
Output (completion) tokensgen_ai.usage.output_tokensllm.token_count.completiongen_ai.usage.output_tokensusage.output
Total tokensderivedllm.token_count.totalllm.usage.total_tokensusage.total
Cache-read input tokensgen_ai.usage.cache_read.input_tokens v1.41llm.token_count.prompt_details.cache_readvia OTelmapped from OTel
Cache-write input tokensgen_ai.usage.cache_creation.input_tokens v1.41llm.token_count.prompt_details.cache_writevia OTelmapped from OTel
Reasoning tokensgen_ai.usage.reasoning.output_tokens v1.41llm.token_count.completion_details.reasoningvia OTelmapped from OTel
Request parameters
Temperaturegen_ai.request.temperaturellm.invocation_parameters (JSON)gen_ai.request.temperaturemodel_parameters.temperature
Top-pgen_ai.request.top_pllm.invocation_parameters (JSON)gen_ai.request.top_pmodel_parameters.top_p
Max tokensgen_ai.request.max_tokensllm.invocation_parameters (JSON)gen_ai.request.max_tokensmodel_parameters.max_tokens
Stop sequencesgen_ai.request.stop_sequencesllm.invocation_parameters (JSON)gen_ai.request.stop_sequencesmodel_parameters.stop
Tools
Tool namegen_ai.tool.nametool.namegen_ai.tool.namename on TOOL obs
Tool descriptiongen_ai.tool.descriptiontool.descriptiongen_ai.tool.descriptionin metadata
Tool call IDgen_ai.tool.call.idtool_call.{i}.tool_call.idgen_ai.tool.call.idin metadata
Tool argsin message part tool_calltool.parameters / tool_call.{i}.tool_call.function.argumentsin messagesinput
Agents & workflows
Agent namegen_ai.agent.nameagent.name (with kind=AGENT)traceloop.agent.namename on AGENT obs
Agent IDgen_ai.agent.idagent.idtraceloop.agent.idid
Agent descriptiongen_ai.agent.descriptionin metadata
Workflow / taskgen_ai.operation.name=invoke_workflow v1.41kind=CHAINtraceloop.workflow.name / .task.nameSPAN obs
Retrieval & embeddings
Embedding text— (not standardized)embedding.embeddings.{i}.embedding.textin messagesinput
Embedding vector— (size only)embedding.embeddings.{i}.embedding.vector— typically droppedin output
Embedding dimensionsgen_ai.embeddings.dimension.count v1.41embedding.model_name implicitvia OTelmapped
Retrieved doc ID— (use db.*)retrieval.documents.{i}.document.idvia vector DB instr.in output
Retrieved doc scoreretrieval.documents.{i}.document.scorein output
Retrieved doc contentretrieval.documents.{i}.document.contentin output
Trace context (user, session, etc.)
User IDuser.id (general)user.idtraceloop.association.properties.user_iduser_id (trace + obs)
Session IDsession.idsession.idtraceloop.association.properties.session_idsession_id (trace + obs)
Tags— (custom)tag.tagscustomtags (trace-level, propagated)
Metadata— (custom)metadata (JSON)custommetadata (trace + obs)
Errors & evaluation
API errorexception event with error.typeexception.* (OTel std)OTel exceptionlevel=ERROR + status_message
Rate limitingerror.type=_OTHER (provider-specific values not standardized)OTel exceptionlevel=WARNING
Evaluation result— (not in spec)kind=EVALUATOR spanscores via API

Metrics

Metrics are where OTel pulls clearly ahead. OpenInference and Langfuse focus on traces and compute roll-ups in the backend; OTel and OpenLLMetry emit native histograms you can scrape into Prometheus / Elastic / any TSDB.

OpenTelemetry — first-class GenAI metrics

gen_ai.client.operation.duration (histogram) gen_ai.client.token.usage (histogram) gen_ai.client.operation.time_to_first_chunk v1.41 gen_ai.client.operation.time_per_output_chunk v1.41 gen_ai.server.request.duration gen_ai.server.time_to_first_token

Streaming metrics are the standout in v1.41: time_to_first_chunk tells you when the user sees something, time_per_output_chunk tells you how snappy the stream feels.

OpenLLMetry — same as OTel + extras

gen_ai.client.operation.duration gen_ai.client.token.usage llm.usage.total_tokens (histogram, per-call) db.client.operations.duration (vector DB)

OpenInference — no native metrics

OpenInference is trace-only by design. The Phoenix / Arize backend computes aggregations (latency p99, token throughput, cost) from spans. If you need Prometheus-style metrics, add an OTel SDK alongside.

Langfuse — backend-computed dashboards

Langfuse derives metrics from observations on ingestion: cost (using its model price catalog), token rates, latency percentiles, score distributions, and per-user/session aggregates surface in the UI. No raw metric stream is emitted — this is a product, not a metrics pipeline.

Practical implication: if you're standardizing on a single observability backend (Elastic, Datadog, Grafana, New Relic), OTel + OpenLLMetry gives you the cleanest metric story today. OpenInference and Langfuse are richer for trace-level analysis but require their backends (or a custom roll-up) for metrics.

Project drill-downs

Pick a project to see its data model, key attributes, and example wire format.

Design philosophy

OTel SemConv defines a vendor-neutral wire format for GenAI telemetry — spans, metrics, and events. Every other project on this page eventually converges on it. The spec is in Development status and moving fast: v1.41.0 (April 28, 2026) added agent/workflow operation separation, streaming metrics, reasoning tokens, and a unified message model.

Span model

One gen_ai.operation.name attribute drives the taxonomy. Spans are CLIENT (external API call) or INTERNAL (in-process). Span name = {operation_name} {model_or_target}.

Key v1.41.0 changes

  • invoke_agent properly distinguishes external (CLIENT) vs in-process (INTERNAL) agent calls — closes the LangChain/CrewAI gap
  • invoke_workflow is now a first-class operation (was previously implicit)
  • Streaming metrics: gen_ai.client.operation.time_to_first_chunk, gen_ai.client.operation.time_per_output_chunk
  • Cache token attributes split (read vs creation)
  • Per-message events (gen_ai.user.message, etc.) deprecated → use gen_ai.input.messages attribute
  • Reasoning tokens tracked via gen_ai.usage.reasoning.output_tokens

Example span

// span name
"chat gpt-4o"

// attributes
gen_ai.operation.name:        "chat"
gen_ai.system:                "openai"
gen_ai.request.model:         "gpt-4o"
gen_ai.response.model:        "gpt-4o-2024-11-20"
gen_ai.request.temperature:   0.7
gen_ai.usage.input_tokens:    412
gen_ai.usage.output_tokens:   128
gen_ai.usage.cache_read.input_tokens:    300
gen_ai.usage.reasoning.output_tokens:    42
gen_ai.response.finish_reasons: ["stop"]
gen_ai.input.messages:        [/* JSON */]
gen_ai.output.messages:       [/* JSON */]

At a glance

Status
Development
Latest
v1.41.0 — Apr 28, 2026
Backed by
CNCF / OpenTelemetry
Span kinds
CLIENT, INTERNAL
Operations
chat, embeddings, text_completion, generate_content, invoke_agent, invoke_workflow, execute_tool, create_agent, retrieval
Metrics
Yes (histograms)
Best for
Vendor-neutral instrumentation, multi-backend portability

Design philosophy

OpenInference defines a strict span-kind taxonomy via the required openinference.span.kind attribute. The schema is opinionated: every span declares whether it's an LLM, EMBEDDING, RETRIEVER, TOOL, AGENT, CHAIN, RERANKER, GUARDRAIL, or EVALUATOR. This makes downstream evaluation and dashboard rendering deterministic, but it requires instrumentation to care about which kind a span is.

Attribute style

Dot-namespaced with indexed flattening: llm.input_messages.0.message.role, llm.input_messages.1.message.content, etc. Every list-valued attribute is exploded this way (as opposed to OTel v1.41's JSON-on-attribute approach).

Where it shines

RAG and retrieval: retrieval.documents.{i}.document.{id,content,score,metadata} is the cleanest schema for retrieval traces among the four. Phoenix / Arize AX render these directly into evaluation pipelines.

Example span (LLM kind)

openinference.span.kind:           "LLM"
llm.model_name:                    "gpt-4o"
llm.provider:                      "openai"
llm.input_messages.0.message.role: "system"
llm.input_messages.0.message.content: "You are…"
llm.input_messages.1.message.role: "user"
llm.input_messages.1.message.content: "Tell me…"
llm.output_messages.0.message.role: "assistant"
llm.token_count.prompt:            412
llm.token_count.completion:        128
llm.token_count.prompt_details.cache_read: 300
llm.invocation_parameters:         '{"temperature":0.7}'

At a glance

Status
Active
Backed by
Arize
Span kinds
LLM, EMBEDDING, CHAIN, RETRIEVER, TOOL, AGENT, RERANKER, GUARDRAIL, EVALUATOR, PROMPT
Attribute style
Dot-flattened with indices
Metrics
None (trace-only)
Best for
RAG, eval pipelines, opinionated dashboards

Design philosophy

OpenLLMetry is the pragmatic upstream contributor. Traceloop (acquired by ServiceNow on March 2, 2026) ships drop-in instrumentations for 30+ LLM providers, frameworks (LangChain, LlamaIndex, Haystack, CrewAI), and vector DBs (Pinecone, Qdrant, Weaviate, Chroma, Milvus, …) — many of which are fed back into OTel SemConv. When OTel adopts a convention, OpenLLMetry deprecates its own and aliases.

Where it differs from raw OTel

  • traceloop.workflow.name / traceloop.task.name for orchestration nesting (a workflow is a sequence of tasks)
  • traceloop.entity.name / traceloop.entity.path for dotted-path call hierarchy
  • traceloop.association.properties.* for attaching user/session/custom IDs
  • Vector DB instrumentations expose db.vector.query.top_k, db.vector.namespace, etc.

Migration trajectory

Older versions emitted gen_ai.prompt / gen_ai.completion; the project is migrating to OTel's gen_ai.input.messages / gen_ai.output.messages. Issue #3515 tracks this transition.

Example span (orchestrated workflow)

// span name
"openai.chat"

gen_ai.system:                "openai"
gen_ai.operation.name:        "chat"
gen_ai.request.model:         "gpt-4o"
gen_ai.usage.input_tokens:    412
gen_ai.usage.output_tokens:   128
traceloop.workflow.name:      "customer_qa"
traceloop.entity.name:        "answer_step"
traceloop.entity.path:        "customer_qa.retrieve.answer_step"
traceloop.association.properties.user_id: "u-42"

At a glance

Status
Active, fast-moving
Backed by
Traceloop / ServiceNow (acq. Mar 2026)
Span kinds
OTel CLIENT/INTERNAL + named entities
Coverage
30+ LLM providers, frameworks, vector DBs
Metrics
OTel-native
Best for
Quick instrumentation, multi-provider apps, OTel-aligned vendors

Design philosophy

Langfuse models traces as a tree of Observations, with three core types: SPAN (generic unit of work), GENERATION (LLM call with prompt/output/usage), and EVENT (point-in-time log). Extended types (Agent, Tool, Chain, Retriever, Evaluator, Embedding, Guardrail) layer semantic meaning onto SPAN. Trace-level attributes (user_id, session_id, tags, metadata) are propagated to every observation for single-table queries.

OTel mapping

Langfuse accepts OTLP and maps incoming OTel attributes onto its data model:

  • spans with gen_ai.operation.name=chat → GENERATION observation
  • gen_ai.usage.input_tokensusage.input
  • gen_ai.request.modelmodel (and matched against Langfuse's price catalog)
  • session.id → trace-level session_id
  • user.id / OpenInference / OpenLLMetry equivalents are recognized

Example observation (GENERATION)

{
  "id": "obs_abc",
  "trace_id": "trc_xyz",
  "type": "GENERATION",
  "name": "answer_step",
  "model": "gpt-4o",
  "model_parameters": { "temperature": 0.7 },
  "input": [{"role":"user", "content":"…"}],
  "output": { "role":"assistant", "content":"…" },
  "usage": { "input":412, "output":128, "total":540 },
  "user_id": "u-42",         // propagated
  "session_id": "s-7",
  "level": "DEFAULT"
}

Strengths beyond the schema

Cost catalog (auto-prices by model), prompt management, evaluators, datasets, sessions UI. The schema decisions reflect product priorities — they're easier to query at the UI level than to interop with externally.

At a glance

Status
Active, OSS + Cloud
Data model
Trace → Observations
Core types
SPAN, GENERATION, EVENT
Extended types
Agent, Tool, Chain, Retriever, Evaluator, Embedding, Guardrail
Ingest
SDKs + OTLP
Best for
End-to-end product UX (sessions, costs, evals, prompts)

Where the ecosystem is converging

Two years ago this would have been four entirely separate worlds. Today the gravitational pull is clearly toward OpenTelemetry — but each project still differentiates where the spec doesn't yet meet their needs.

What's already aligned

Token usage
OTel gen_ai.usage.input_tokens / output_tokens are now read by all four projects (Langfuse and OpenLLMetry natively, OpenInference via interop layers).
Model + provider
gen_ai.request.model + gen_ai.system is universally understood.
Tool calls
Tool name and call ID converged on gen_ai.tool.name / gen_ai.tool.call.id in v1.41.0; OpenInference and Langfuse map theirs.
OTLP transport
Langfuse, Phoenix (OpenInference), Traceloop, and every observability vendor accept OTLP. The wire is settled even when the schema isn't.

Where divergence remains

Span taxonomy
OTel uses operation names, OpenInference uses required span-kinds, Langfuse uses observation types. There's no plan to unify; this is the deepest philosophical split.
Message representation
OTel v1.41 picked JSON-on-attribute (gen_ai.input.messages); OpenInference uses indexed flattening (llm.input_messages.0…). Both are still in production code.
Retrieval / RAG attributes
OpenInference has the richest schema (retrieval.documents.{i}.*). OTel still says "use db.*" and leaves the retrieval-specific bits unstandardized.
Trace-level context
Langfuse propagates user/session/tags to every observation by design. OTel puts session.id on the span; downstream propagation is the consumer's job.

Practical takeaways for instrumenting today

  1. Emit OTel SemConv v1.41.0. Every backend can read it. Lock-in costs you nothing.
  2. If you use Phoenix/Arize: add openinference.span.kind alongside the OTel attributes — they coexist on the same span.
  3. If you use Langfuse: set session.id and user.id as OTel attributes. Langfuse maps them automatically.
  4. If you use OpenLLMetry: you're already on OTel. Watch for deprecations as Traceloop aliases land upstream.
  5. Avoid hard-coding any single project's attribute names in your app code. Use the SDK abstraction; let it emit the right names per backend.
Bottom line: the GenAI observability space is early enough that convergence is still possible. Every project that aligns with OpenTelemetry makes the pie bigger for everyone — vendors compete on value, not on proprietary schemas. v1.41.0 is the most significant step in that direction so far.