How to add Probabilistic Sampling to an OpenTelemetry Collector to limit traffic


Overview

An application with heavy usage may create a significant amount of OpenTelemetry traffic that prevents the OpenTelemetry Endpoint from processing this data without using a significant amount of resources. Probabilistic Sampling allows the OpenTelemetry Collector to receive the Application’s OpenTelemetry Traces and Logs, but to only send some percentage of this traffic onto the OpenTelemetry Endpoint for processing.

Some OpenTelemetry Endpoints, like Azure Monitor, have hard-coded limits to the amount of data that can be simultaneously processed. Using Probabilistic Sampling to send a smaller percentage of the data may help these Endpoints successfully process the OpenTelemetry data they receive.

For more information about Probabilistic Sampling, see:


Steps to implement Probabilistic Sampling

Take the following steps to implement Probabilistic Sampling:

  1. Open the OpenTelemetry Collector’s config.yaml
  2. Add the following Processors and Service Pipeline entries
processors:
  probabilistic_sampler:
    sampling_percentage: 50
    mode: "proportional"
  probabilistic_sampler/logs:
    sampling_percentage: 50
    attribute_source: "traceID"

service:
  pipelines:
    traces:
      processors: [ probabilistic_sampler ]
    logs:
      processors: [ probabilistic_sampler/logs ]