Side-by-side comparison of OpenTelemetry Semantic Conventions, OpenInference, OpenLLMetry, and Langfuse — how each models LLM calls, agents, tools, and metrics, and where the ecosystem is converging.
Each project optimizes for a different audience. OTel optimizes for vendor neutrality, OpenInference for evaluation tooling, OpenLLMetry for fast pragmatic instrumentation, Langfuse for a single product UX. Convergence on OTel is now real but uneven.
chat, embeddings, invoke_agent, invoke_workflow, execute_tool), metrics, and events. Status: Development.
openinference.span.kind attribute is required on every span and drives a strict taxonomy: LLM, EMBEDDING, CHAIN, RETRIEVER, TOOL, AGENT, RERANKER, GUARDRAIL, EVALUATOR, PROMPT. Built around tracing for evaluation pipelines (Phoenix, Arize AX).
pip install to traces.SPAN, GENERATION, or EVENT (plus extended types: Agent, Tool, Chain, Retriever, Evaluator, Embedding, Guardrail). Accepts OTel data and maps it onto this model for the Langfuse UI.
If you only remember one thing per project:
{operation} {target}. Example: chat gpt-4o, invoke_agent customer_support, execute_tool get_weather.openinference.span.kind = LLM | AGENT | TOOL | RETRIEVER | …. Backend uses this to decide rendering and aggregation.traceloop.workflow.name, traceloop.entity.path) for nested orchestration.type field. The UI groups GENERATION observations into "model calls" and rolls up cost/tokens to the trace.The biggest divergence between schemas is how spans are named and categorized. OTel uses an operation-name attribute, OpenInference uses a required span-kind, Langfuse uses observation types, OpenLLMetry follows OTel.
gen_ai.operation.nameSpan name format: {operation_name} {model_or_target} — e.g., chat gpt-4o, invoke_agent triage_agent.
openinference.span.kind (required)CHAIN is unique to OpenInference: it represents glue code linking other spans (the "request → retrieve → LLM → respond" wrapper). PROMPT captures prompt template rendering.
Vector DB calls follow OTel db.* conventions plus Traceloop additions (db.vector.query.top_k, etc.).
type* Extended types added Aug 2025 — semantically richer rendering, but the core data model is still the SPAN/GENERATION/EVENT triple.
Same workload, four different shapes.
# root span: "invoke_agent rag_qa" gen_ai.operation.name=invoke_agent span.kind=CLIENT ├─ "embeddings text-embedding-3" │ gen_ai.operation.name=embeddings ├─ "db.search vector_index" # OTel db.* ├─ "chat gpt-4o" │ gen_ai.operation.name=chat └─ "execute_tool get_doc" gen_ai.operation.name=execute_tool
# root span: "agent.run" openinference.span.kind=AGENT ├─ kind=EMBEDDING │ embedding.embeddings.0.embedding.text=… ├─ kind=RETRIEVER │ retrieval.documents.0.document.id=… ├─ kind=LLM │ llm.input_messages.0.message.role=user └─ kind=TOOL tool.name=get_doc
# root span: "rag_qa.workflow" traceloop.workflow.name=rag_qa ├─ "openai.embeddings" │ gen_ai.system=openai ├─ "pinecone.query" │ db.system=pinecone ├─ "openai.chat" │ gen_ai.operation.name=chat └─ "tool.get_doc" traceloop.entity.name=get_doc
# trace (top-level) trace: "rag_qa" user_id=… session_id=… ├─ obs type=EMBEDDING │ model=text-embedding-3 ├─ obs type=RETRIEVER │ input="…question…" ├─ obs type=GENERATION │ model=gpt-4o │ usage.input/output/total └─ obs type=TOOL name=get_doc
The same concept, four different attribute names. Filter by concept or toggle columns to focus on the schemas you care about. Attributes flagged v1.41 are new in OTel SemConv v1.41.0.
| Concept | OTel SemConv | OpenInference | OpenLLMetry | Langfuse |
|---|---|---|---|---|
| Identity & routing | ||||
| Model name | gen_ai.request.model | llm.model_name | gen_ai.request.model | model (on GENERATION) |
| Provider / system | gen_ai.system (openai, anthropic, …) | llm.provider | gen_ai.system | model.provider (mapped) |
| Operation | gen_ai.operation.name | openinference.span.kind | gen_ai.operation.name | observation type |
| Response model | gen_ai.response.model | llm.model_name | gen_ai.response.model | completion_start_time meta |
| Response ID | gen_ai.response.id | llm.output_messages.{i}.id | gen_ai.response.id | id on observation |
| Messages & content | ||||
| System instructions | gen_ai.system_instructions v1.41 | llm.input_messages.{i}.message.role=system | gen_ai.system_instructions | part of input |
| Input messages | gen_ai.input.messages v1.41 | llm.input_messages.{i}.message.role/content | gen_ai.input.messages | input (JSON) |
| Output messages | gen_ai.output.messages v1.41 | llm.output_messages.{i}.message.role/content | gen_ai.output.messages | output (JSON) |
| Reasoning content | message part type=reasoning v1.41 | — | via OTel | in output JSON |
| Finish reason | gen_ai.response.finish_reasons | llm.output_messages.{i}.message.finish_reason | gen_ai.response.finish_reasons | in metadata |
| Token usage | ||||
| Input (prompt) tokens | gen_ai.usage.input_tokens | llm.token_count.prompt | gen_ai.usage.input_tokens | usage.input |
| Output (completion) tokens | gen_ai.usage.output_tokens | llm.token_count.completion | gen_ai.usage.output_tokens | usage.output |
| Total tokens | derived | llm.token_count.total | llm.usage.total_tokens | usage.total |
| Cache-read input tokens | gen_ai.usage.cache_read.input_tokens v1.41 | llm.token_count.prompt_details.cache_read | via OTel | mapped from OTel |
| Cache-write input tokens | gen_ai.usage.cache_creation.input_tokens v1.41 | llm.token_count.prompt_details.cache_write | via OTel | mapped from OTel |
| Reasoning tokens | gen_ai.usage.reasoning.output_tokens v1.41 | llm.token_count.completion_details.reasoning | via OTel | mapped from OTel |
| Request parameters | ||||
| Temperature | gen_ai.request.temperature | llm.invocation_parameters (JSON) | gen_ai.request.temperature | model_parameters.temperature |
| Top-p | gen_ai.request.top_p | llm.invocation_parameters (JSON) | gen_ai.request.top_p | model_parameters.top_p |
| Max tokens | gen_ai.request.max_tokens | llm.invocation_parameters (JSON) | gen_ai.request.max_tokens | model_parameters.max_tokens |
| Stop sequences | gen_ai.request.stop_sequences | llm.invocation_parameters (JSON) | gen_ai.request.stop_sequences | model_parameters.stop |
| Tools | ||||
| Tool name | gen_ai.tool.name | tool.name | gen_ai.tool.name | name on TOOL obs |
| Tool description | gen_ai.tool.description | tool.description | gen_ai.tool.description | in metadata |
| Tool call ID | gen_ai.tool.call.id | tool_call.{i}.tool_call.id | gen_ai.tool.call.id | in metadata |
| Tool args | in message part tool_call | tool.parameters / tool_call.{i}.tool_call.function.arguments | in messages | input |
| Agents & workflows | ||||
| Agent name | gen_ai.agent.name | agent.name (with kind=AGENT) | traceloop.agent.name | name on AGENT obs |
| Agent ID | gen_ai.agent.id | agent.id | traceloop.agent.id | id |
| Agent description | gen_ai.agent.description | — | — | in metadata |
| Workflow / task | gen_ai.operation.name=invoke_workflow v1.41 | kind=CHAIN | traceloop.workflow.name / .task.name | SPAN obs |
| Retrieval & embeddings | ||||
| Embedding text | — (not standardized) | embedding.embeddings.{i}.embedding.text | in messages | input |
| Embedding vector | — (size only) | embedding.embeddings.{i}.embedding.vector | — typically dropped | in output |
| Embedding dimensions | gen_ai.embeddings.dimension.count v1.41 | embedding.model_name implicit | via OTel | mapped |
| Retrieved doc ID | — (use db.*) | retrieval.documents.{i}.document.id | via vector DB instr. | in output |
| Retrieved doc score | — | retrieval.documents.{i}.document.score | — | in output |
| Retrieved doc content | — | retrieval.documents.{i}.document.content | — | in output |
| Trace context (user, session, etc.) | ||||
| User ID | user.id (general) | user.id | traceloop.association.properties.user_id | user_id (trace + obs) |
| Session ID | session.id | session.id | traceloop.association.properties.session_id | session_id (trace + obs) |
| Tags | — (custom) | tag.tags | custom | tags (trace-level, propagated) |
| Metadata | — (custom) | metadata (JSON) | custom | metadata (trace + obs) |
| Errors & evaluation | ||||
| API error | exception event with error.type | exception.* (OTel std) | OTel exception | level=ERROR + status_message |
| Rate limiting | error.type=_OTHER (provider-specific values not standardized) | — | OTel exception | level=WARNING |
| Evaluation result | — (not in spec) | kind=EVALUATOR span | — | scores via API |
Metrics are where OTel pulls clearly ahead. OpenInference and Langfuse focus on traces and compute roll-ups in the backend; OTel and OpenLLMetry emit native histograms you can scrape into Prometheus / Elastic / any TSDB.
Streaming metrics are the standout in v1.41: time_to_first_chunk tells you when the user sees something, time_per_output_chunk tells you how snappy the stream feels.
OpenInference is trace-only by design. The Phoenix / Arize backend computes aggregations (latency p99, token throughput, cost) from spans. If you need Prometheus-style metrics, add an OTel SDK alongside.
Langfuse derives metrics from observations on ingestion: cost (using its model price catalog), token rates, latency percentiles, score distributions, and per-user/session aggregates surface in the UI. No raw metric stream is emitted — this is a product, not a metrics pipeline.
Pick a project to see its data model, key attributes, and example wire format.
OTel SemConv defines a vendor-neutral wire format for GenAI telemetry — spans, metrics, and events. Every other project on this page eventually converges on it. The spec is in Development status and moving fast: v1.41.0 (April 28, 2026) added agent/workflow operation separation, streaming metrics, reasoning tokens, and a unified message model.
One gen_ai.operation.name attribute drives the taxonomy. Spans are CLIENT (external API call) or INTERNAL (in-process). Span name = {operation_name} {model_or_target}.
invoke_agent properly distinguishes external (CLIENT) vs in-process (INTERNAL) agent calls — closes the LangChain/CrewAI gapinvoke_workflow is now a first-class operation (was previously implicit)gen_ai.client.operation.time_to_first_chunk, gen_ai.client.operation.time_per_output_chunkgen_ai.user.message, etc.) deprecated → use gen_ai.input.messages attributegen_ai.usage.reasoning.output_tokens// span name "chat gpt-4o" // attributes gen_ai.operation.name: "chat" gen_ai.system: "openai" gen_ai.request.model: "gpt-4o" gen_ai.response.model: "gpt-4o-2024-11-20" gen_ai.request.temperature: 0.7 gen_ai.usage.input_tokens: 412 gen_ai.usage.output_tokens: 128 gen_ai.usage.cache_read.input_tokens: 300 gen_ai.usage.reasoning.output_tokens: 42 gen_ai.response.finish_reasons: ["stop"] gen_ai.input.messages: [/* JSON */] gen_ai.output.messages: [/* JSON */]
OpenInference defines a strict span-kind taxonomy via the required openinference.span.kind attribute. The schema is opinionated: every span declares whether it's an LLM, EMBEDDING, RETRIEVER, TOOL, AGENT, CHAIN, RERANKER, GUARDRAIL, or EVALUATOR. This makes downstream evaluation and dashboard rendering deterministic, but it requires instrumentation to care about which kind a span is.
Dot-namespaced with indexed flattening: llm.input_messages.0.message.role, llm.input_messages.1.message.content, etc. Every list-valued attribute is exploded this way (as opposed to OTel v1.41's JSON-on-attribute approach).
RAG and retrieval: retrieval.documents.{i}.document.{id,content,score,metadata} is the cleanest schema for retrieval traces among the four. Phoenix / Arize AX render these directly into evaluation pipelines.
openinference.span.kind: "LLM" llm.model_name: "gpt-4o" llm.provider: "openai" llm.input_messages.0.message.role: "system" llm.input_messages.0.message.content: "You are…" llm.input_messages.1.message.role: "user" llm.input_messages.1.message.content: "Tell me…" llm.output_messages.0.message.role: "assistant" llm.token_count.prompt: 412 llm.token_count.completion: 128 llm.token_count.prompt_details.cache_read: 300 llm.invocation_parameters: '{"temperature":0.7}'
OpenLLMetry is the pragmatic upstream contributor. Traceloop (acquired by ServiceNow on March 2, 2026) ships drop-in instrumentations for 30+ LLM providers, frameworks (LangChain, LlamaIndex, Haystack, CrewAI), and vector DBs (Pinecone, Qdrant, Weaviate, Chroma, Milvus, …) — many of which are fed back into OTel SemConv. When OTel adopts a convention, OpenLLMetry deprecates its own and aliases.
traceloop.workflow.name / traceloop.task.name for orchestration nesting (a workflow is a sequence of tasks)traceloop.entity.name / traceloop.entity.path for dotted-path call hierarchytraceloop.association.properties.* for attaching user/session/custom IDsdb.vector.query.top_k, db.vector.namespace, etc.Older versions emitted gen_ai.prompt / gen_ai.completion; the project is migrating to OTel's gen_ai.input.messages / gen_ai.output.messages. Issue #3515 tracks this transition.
// span name "openai.chat" gen_ai.system: "openai" gen_ai.operation.name: "chat" gen_ai.request.model: "gpt-4o" gen_ai.usage.input_tokens: 412 gen_ai.usage.output_tokens: 128 traceloop.workflow.name: "customer_qa" traceloop.entity.name: "answer_step" traceloop.entity.path: "customer_qa.retrieve.answer_step" traceloop.association.properties.user_id: "u-42"
Langfuse models traces as a tree of Observations, with three core types: SPAN (generic unit of work), GENERATION (LLM call with prompt/output/usage), and EVENT (point-in-time log). Extended types (Agent, Tool, Chain, Retriever, Evaluator, Embedding, Guardrail) layer semantic meaning onto SPAN. Trace-level attributes (user_id, session_id, tags, metadata) are propagated to every observation for single-table queries.
Langfuse accepts OTLP and maps incoming OTel attributes onto its data model:
gen_ai.operation.name=chat → GENERATION observationgen_ai.usage.input_tokens → usage.inputgen_ai.request.model → model (and matched against Langfuse's price catalog)session.id → trace-level session_iduser.id / OpenInference / OpenLLMetry equivalents are recognized
{
"id": "obs_abc",
"trace_id": "trc_xyz",
"type": "GENERATION",
"name": "answer_step",
"model": "gpt-4o",
"model_parameters": { "temperature": 0.7 },
"input": [{"role":"user", "content":"…"}],
"output": { "role":"assistant", "content":"…" },
"usage": { "input":412, "output":128, "total":540 },
"user_id": "u-42", // propagated
"session_id": "s-7",
"level": "DEFAULT"
}
Cost catalog (auto-prices by model), prompt management, evaluators, datasets, sessions UI. The schema decisions reflect product priorities — they're easier to query at the UI level than to interop with externally.
Two years ago this would have been four entirely separate worlds. Today the gravitational pull is clearly toward OpenTelemetry — but each project still differentiates where the spec doesn't yet meet their needs.
gen_ai.usage.input_tokens / output_tokens are now read by all four projects (Langfuse and OpenLLMetry natively, OpenInference via interop layers).gen_ai.request.model + gen_ai.system is universally understood.gen_ai.tool.name / gen_ai.tool.call.id in v1.41.0; OpenInference and Langfuse map theirs.gen_ai.input.messages); OpenInference uses indexed flattening (llm.input_messages.0…). Both are still in production code.retrieval.documents.{i}.*). OTel still says "use db.*" and leaves the retrieval-specific bits unstandardized.session.id on the span; downstream propagation is the consumer's job.openinference.span.kind alongside the OTel attributes — they coexist on the same span.session.id and user.id as OTel attributes. Langfuse maps them automatically.