OpenTelemetry Collector


OpenTelemetry Collectoropen in new window is a proxy service between your application and Uptrace (or another vendor or collector). It receives telemetry data, transforms the data, and then exports it to backends that can store the data permanently.

Collector can also work as an agent that pulls telemetry data (for example, metrics) from monitored programs and then exports it to the configured backends.

The most prominent OpenTelemetry Collector feature is the ability to operate on whole traces instead of individual spans. To achieve that OpenTelemetry Collector buffers the received spans in the RAM and groups them by a trace id. That is the key requirement to implement tail-based sampling.

OpenTelemetry Collector uses Apache 2.0 license which allows you to change the source code and install custom extensions. That comes at a cost of running and maintaining your own OpenTelemetry Collector servers.


Uptrace supports OpenTelemetry Protocol over gRPC and HTTP. For gRPC use:

    headers: { 'uptrace-dsn': '<dsn>' }
    compression: on

For HTTP use:

    headers: { 'uptrace-dsn': '<dsn>' }
    compression: gzip

Tail-based sampling

With normal (head-based) sampling the sampling decision is made upfront and usually at random. Head-based sampling can't sample failed or unusually long operations, because that data may be only available at the end of a trace.

With tail-based sampling we delay the sampling decision until all spans of the trace are collected and we have full information about the trace. Tail-based sampling requires running an OpenTelemetry Collector that buffers and groups spans by a trace id.

But tail-based sampling has few shortcomings:

  • It requires running and maintaining separate servers with OpenTelemetry Collectors.
  • It requires traces to finish in a timely manner and fit RAM. Otherwise tail-based sampling fails to collect whole traces.
  • It skews the collected data (as any other sampling).

In the end it all comes down to the money you spend on Uptrace vs OpenTelemetry Collectors. Because storage is way cheaper than computing resources (and with tail-based sampling you need to process data twice), it is usually better to spend the money on processing more spans than on the tail-based sampling.