# Sampling and rate limiting

# Introduction

Sampling reduces the cost of tracing when you can't afford to record and keep all traces. Sampling should sample or drop whole traces, not separate spans. Sampling should work with traces that are distributed over time and multiple servers. For example, it should handle traces that take hours to complete and come from a distributed system.

Only head-based sampling fully satisfies all requirements. It also provides sampling probability which Uptrace uses to adjust number of sampled traces. For example, if your sampling probability is 50% and number of sampled traces is 10, then the adjusted (unsampled) number of traces is 10 / 50% = 20. But:

  • It requires configuring fixed sampling probability on a client.
  • It does not rate limit spikes in traffic which can be a problem if you have regular spikes.

As a rule of thumb you should use head-based sampling as the main method to reduce number of traces. And rate limiting to reduce number of traces during spikes. Use tail-based sampling together with head-based sampling if you want more control over which traces are sampled.

NameSideCompexity and costAccuracy
Head-based samplingClient-sideSimple and practically free.100%
Uptrace rate limitingServer-sideComplex. Part of Uptrace price.< 100%
Tail-based samplingServer-sideComplex. Requires OTel Collector.< 100%

# Head-based sampling

Head-based sampling decides whether to record and export a trace by making a sampling decision as soon as a span name is known. Sampling ensures that the whole (potentially distributed) trace is either sampled or dropped. It uses TraceIdRatioBased sampleropen in new window to sample a fraction (for example, 50%) of traces and propagates the sampling decision from one service to another.

OpenTelemetry has 2 span properties responsible for sampling:

  • IsRecording - when false, span discards attributes, events, links etc.
  • Sampled - when false, OpenTelemetry drops the span.

You should check IsRecording property to avoid collecting expensive trace data.

Sampler is a function that accepts a root span about to be created. The function returns a sampling decision which must be one of:

  • Drop - trace is dropped. IsRecording = false, Sampled = false.
  • RecordOnly - trace is recorded but not sampled. IsRecording = true, Sampled = false.
  • RecordAndSample - trace is recorded and sampled. IsRecording = true, Sampled = true.

By default OpenTelemetry samples all traces, but you can configure it to sample only a fraction of traces. In that case Uptrace uses the sampling probability from the sampler to adjust number of spans according to the sampling propability.

# Rate limiting

Rate limiting should be used together with head-based sampling to ensure that you stay within a budget during spikes. You can specify your budget on the billingopen in new window page after billing is enabled.

For the best results rate limit should be set high enough to only limit traces during spikes in traffic. During normal periods you should either process all traces or use head-based sampling.

# Tail-based sampling

See Tail-based sampling on OpenTelemetry Collector page.