Sampling and rate-limiting

Introduction

Sampling reduces the cost and verbosity of tracing by reducing the number of collected (sampled) spans. Sampling may happen in different stages of spans processing:

  • when a span is created (head-based sampling);
  • when a span is received by a backend (rate-limiting sampling);
  • when a complete trace is fully assembled (tail-based sampling).

Sampling provides a sampling probability which enables accurate statistical counting of all spans using only a portion of sampled spans. For example, if the sampling probability is 50% and the number of sampled spans is 10, then the adjusted (total) number of spans is 10 / 50% = 20.

NameCompexity and costAdjusted count
Head-based sampling (client-side)Simple and practically free.Yes.
Rate-limiting sampling (server-side)Complex. Part of Uptrace price.Yes.
Tail-based sampling (server-side)Complex. Requires OTel Collector.Yes.

Head-based sampling makes the sampling decision as soon as possible and propagates it to other participants using the context. This allows saving a lot of resources by not collecting any telemetry data for dropped spans. Is the simplest, most accurate, and most reliable sampling method which you should prefer over all other methods.

But head-based sampling is not flexible enough to handle traffic spikes and may collect more data than desired. This is where rate-limiting sampling becomes handy. It ensures that backends do not exceed certain limits when receiving spans from clients. Uptrace applies rate-limiting sampling automatically when needed.

Tail-based sampling makes the sampling decision when a complete trace is assembled which enables better sampling decisions based on all data from the trace. For example, you can sample failed or unusually long traces.

Head-based sampling

Head-based sampling decides whether to record and export a trace by making a sampling decision as soon as a span name is available. Sampling ensures that the whole (potentially distributed) trace is either fully sampled or dropped. It uses TraceIdRatioBased sampleropen in new window to sample a portion (for example, 50%) of traces and propagates the sampling decision from one service to another.

OpenTelemetry has 2 span properties responsible for sampling:

  • IsRecording - when false, span discards attributes, events, links etc.
  • Sampled - when false, OpenTelemetry drops the span.

You should check IsRecording property to avoid collecting expensive telemetry data.

if span.IsRecording() {
    // collect expensive data
}

Sampler is a function that accepts a root span about to be created. The function returns a sampling decision which must be one of:

  • Drop - trace is dropped. IsRecording = false, Sampled = false.
  • RecordOnly - trace is recorded but not sampled. IsRecording = true, Sampled = false.
  • RecordAndSample - trace is recorded and sampled. IsRecording = true, Sampled = true.

By default, OpenTelemetry samples all traces, but you can configure it to sample a portion of traces. In that case, Uptrace uses the sampling probability to adjust the number of spans.

Rate-limiting sampling

Uptrace automatically uses rate-limiting sampling to ensure that you stay within a budget. You can always see your usage and change the budget on the billingopen in new window page.

To achieve better results and improve performance, you should use rate-limiting sampling together with head-based sampling which makes sampling decisions much earlier.

Tail-based sampling

See Tail-based sampling on OpenTelemetry Collector page.