Data Sampling
When managing high-volume telemetry environments, you may not need or want to send 100% of your traces and logs to KloudMate. Data sampling allows you to selectively retain a representative subset of your telemetry, helping you manage storage costs and network bandwidth without losing visibility into your system’s overall health and performance.
The OpenTelemetry Collector provides several processors for sampling data. The two most common approaches are Head Sampling (making a sampling decision at the beginning of a trace) and Tail Sampling (making a decision after all spans for a trace have been collected).
Probabilistic Sampling (Head Sampling)
Section titled “Probabilistic Sampling (Head Sampling)”The easiest way to reduce volume is to apply a uniform probabilistic sampling rate. The probabilistic_sampler processor randomly drops a specified percentage of your data.
This processor can be used for both traces and logs.
Example: Keep 10% of Traces and Logs
Section titled “Example: Keep 10% of Traces and Logs”Add the probabilistic_sampler to your processors configuration and specify the sampling_percentage.
Tail Sampling
Section titled “Tail Sampling”Tail sampling gives you much more control by allowing you to evaluate the entire trace before deciding whether to keep it or drop it. This is highly recommended because it allows you to configure policies like “keep 100% of errors, but only keep 10% of successful traces.”
The tail_sampling processor works specifically with traces.
Example: Keep All Errors, Sample Successes
Section titled “Example: Keep All Errors, Sample Successes”The following configuration uses multiple policies. The Collector evaluates policies in order. If a trace matches any policy that decides to keep it, the trace is exported.