Sampling and rate limiting¶
Introduction¶
Sampling reduces the cost of tracing when you can't afford to record and keep all traces. Sampling should sample or drop whole traces, not separate spans. Sampling should work with traces that are distributed over time and multiple servers. For example, it should handle traces that take hours to complete and come from a distributed system.
Only head-based sampling fully satisfies all requirements. It also provides sampling probability which Uptrace uses to adjust number of sampled traces. For example, if your sampling probability is 50% and number of sampled traces is 10, then the adjusted (unsampled) number of traces is 10 / 50% = 20
. But:
- It requires configuring fixed sampling probability on a client.
- It does not rate limit spikes in traffic which can be a problem if you have regular spikes.
As a rule of thumb you should use head-based sampling as the main method to reduce number of traces. And rate limiting to reduce number of traces during spikes. Use tail-based sampling together with head-based sampling if you want more control over which traces are sampled.
Name | Side | Compexity and cost | Accuracy |
---|---|---|---|
Head-based sampling | Client-side | Simple and practically free. | 100% |
Uptrace rate limiting | Server-side | Complex. Part of Uptrace price. | < 100% |
Tail-based sampling | Server-side | Complex. Requires Otel Collector. | < 100% |
Head-based sampling¶
Head-based sampling decides whether to record and export a trace by making a sampling decision as soon as a span name is known. Sampling ensures that the whole (potentially distributed) trace is either sampled or dropped. It uses TraceIdRatioBased sampler to sample a fraction (for example, 50%) of traces and propagates the sampling decision from one service to another.
OpenTelemetry has 2 span properties responsible for sampling:
IsRecording
- whenfalse
, span discards attributes, events, links etc.Sampled
- whenfalse
, OpenTelemetry drops the span.
You should check IsRecording
property to avoid collecting expensive trace data.
Sampler is a function that accepts a root span about to be created. The function returns a sampling decision which must be one of:
- Drop - trace is dropped.
IsRecording = false
,Sampled = false
. - RecordOnly - trace is recorded but not sampled.
IsRecording = true
,Sampled = false
. - RecordAndSample - trace is recorded and sampled.
IsRecording = true
,Sampled = true
.
By default Uptrace client samples all traces, but you can configure it to sample only a fraction of traces. In that case Uptrace uses the sampling probability from the sampler to adjust number of spans according to the sampling propability.
Rate limiting¶
Rate limiting should be used together with head-based sampling to ensure that you stay within a budget during spikes. You can set a budget on your billing page after billing is enabled.
Rate limit should be set high enough to only limit traces during spikes in traffic. During normal periods you should rely on head-based sampling which has a number of advantages described above.
Tail-based sampling¶
See Tail-based sampling on OpenTelemetry Collector page.