Mapping Rules

Mapping rules are used to configure the storage policy for metrics. The storage policy determines how long to store metrics for and at what resolution to keep them at. For example, a storage policy of 1m:48h tells M3 to keep the metrics for 48hrs at a 1min resolution. Mapping rules can be configured in the m3coordinator configuration file under the downsample > rules > mappingRules stanza. We will use the following as an example.

downsample:
  rules:
    mappingRules:
      - name: "mysql metrics"
        filter: "app:mysql*"
        aggregations: ["Last"]
        storagePolicies:
          - resolution: 1m
            retention: 48h
      - name: "nginx metrics"
        filter: "app:nginx*"
        aggregations: ["Last"]
        storagePolicies:
          - resolution: 30s
            retention: 24h
          - resolution: 1m
            retention: 48h

Here, we have two mapping rules configured – one for mysql metrics and one for nginx metrics. The filter determines what metrics each rule applies to. The mysql metrics rule will apply to any metrics where the app tag contains mysql* as the value (* being a wildcard). Similarly, the nginx metrics rule will apply to all metrics where the app tag contains nginx* as the value.

The aggregations field determines what functions to apply to the datapoints within a resolution tile. For example, if an application emits a metric every 10sec and the resolution for that metrics’s storage policy is 1min, M3 will need to combine 6 datapoints. If the aggregations policy is Last, M3 will take the last value in that 1min bucket. aggregations can be one of the following:

Last
Min
Max
Mean
Median
Count
Sum
SumSq
Stdev
P10
P20
P30
P40
P50
P60
P70
P80
P90
P95
P99
P999
P9999

Lastly, the storagePolicies field determines which namespaces to store the metrics in. For example, the mysql metrics will be sent to the 1m:48h namespace, while the nginx metrics will be sent to both the 1m:48h and 30s:24h namespaces.

Note: the namespaces listed under the storagePolicies stanza must exist in M3DB.

Rollup Rules

Rollup rules are used to rollup metrics and aggregate in different ways by arbitrary dimensions before they are stored.

Aggregating counters example

Here’s an example of creating a new monotonic counter called http_request_rollup_no_pod_bucket from a set of histogram metrics originally called http_request_bucket:

downsample:
  rules:
    rollupRules:
      - name: "http_request latency by route and git_sha without pod"
        filter: "__name__:http_request_bucket k8s_pod:* le:* git_sha:* route:*"
        transforms:
        - transform:
            type: "Increase"
        - rollup:
            metricName: "http_request_rollup_no_pod_bucket"
            groupBy: ["le", "git_sha", "route", "status_code", "region"]
            aggregations: ["Sum"]
        - transform:
            type: "Add"
        storagePolicies:
        - resolution: 30s
          retention: 720h

Note: only metrics that contain all of the group_by tags will be rolled up. For example, in the above config, only http_request_bucket metrics that have all of the group_by labels present will be rolled up into the new metric http_request_rollup_no_pod_bucket.

While the above example can be used to create a new rolled up metric, often times the goal of rollup rules is to eliminate the underlaying, raw metrics. In order to do this, a mappingRule will need to be added like in the following example (using the metric above as an example) with drop set to true. Additionally, if all of the underlaying metrics are being dropped, there is no need to change the metric name (e.g. in the rollupRule, the metricName field can be equal to the existing metric) – see below for an example.

downsample:
  rules:
    mappingRules:
      - name: "http_request latency by route and git_sha drop raw"
        filter: "__name__:http_request_bucket k8s_pod:* le:* git_sha:* route:*"
        drop: true
    rollupRules:
      - name: "http_request latency by route and git_sha without pod"
        filter: "__name__:http_request_bucket k8s_pod:* le:* git_sha:* route:*"
        transforms:
        - transform:
            type: "Increase"
        - rollup:
            metricName: "http_request_bucket" # metric name doesn't change
            groupBy: ["le", "git_sha", "route", "status_code", "region"]
            aggregations: ["Sum"]
        - transform:
            type: "Add"
        storagePolicies:
        - resolution: 30s
          retention: 720h

Storage policies and rollup rules

Note: In order to store rolled up metrics in an unaggregated namespace, the namespace’s aggregationOptions must have a matching aggregation. For example, if in the above rule, the 720h namespace under storagePolicies is unaggregated, the aggregationOptions for that namespace should resemble the following:

"aggregationOptions": {
  "aggregations": [
    {
      "aggregated": false
    },
    {
      "aggregated": true,
      "attributes": {
        "resolutionDuration": "30s",
        "downsampleOptions": { "all": false }
      }
    }
  ]
}

Aggregating gauges example

The following is an example of a sensible set of aggregations across an example metric which represents a job queue length. The aggregations provide the sum, average, max and min across all instances for the job queue length with different aggregate metric names.

downsample:
  rules:
    rollupRules:
      - name: "job queue length sum across pods pod"
        filter: "__name__:job_queue_length k8s_pod:*"
        transforms:
        - aggregate:
            type: "Last"
        - rollup:
            metricName: "job_queue_length:sum"
            excludeBy: ["k8s_pod"]
            aggregations: ["Sum"]
        storagePolicies:
        - resolution: 30s
          retention: 720h
      - name: "job queue length average across pods pod"
        filter: "__name__:job_queue_length k8s_pod:*"
        transforms:
        - aggregate:
            type: "Last"
        - rollup:
            metricName: "job_queue_length:avg"
            excludeBy: ["k8s_pod"]
            aggregations: ["Mean"]
        storagePolicies:
        - resolution: 30s
          retention: 720h
      - name: "job queue length max across pods pod"
        filter: "__name__:job_queue_length k8s_pod:*"
        transforms:
        - aggregate:
            type: "Last"
        - rollup:
            metricName: "job_queue_length:max"
            excludeBy: ["k8s_pod"]
            aggregations: ["Max"]
        storagePolicies:
        - resolution: 30s
          retention: 720h
      - name: "job queue length min across pods pod"
        filter: "__name__:job_queue_length k8s_pod:*"
        transforms:
        - aggregate:
            type: "Last"
        - rollup:
            metricName: "job_queue_length:min"
            excludeBy: ["k8s_pod"]
            aggregations: ["Min"]
        storagePolicies:
        - resolution: 30s
          retention: 720h