You are using elastic APM (https://www.elastic.co/fr/observability/application-performance-monitoring) to watch whats happen inside your application.
Basically, APM will write into elasticsearch 2 kind of documents:
In a case of a web application, transaction will represent the end-user HTTP request, and span any interaction your application does with DB, cache, micro-service, external HTTP API etc…
Elastic proposes also an APM agent to capture user interaction with clients such as web browsers: Real User Monitoring, aka RUM. Here “transaction” represent the 1st page load, and span any user interaction (CSS download, document parsing, DNS resolution etc…).
Elastic APM is really cool, easy to setup and this is a must have to monitor your entire infrastructure, especially with micro-services. But it comes with a cost: elasticsearch sizing!
At X.com, even with a low sample rate, we see daily APM elasticsearch index of 100 Go ! We have worked hard to reduce it, but the more data the easier developer can debug. We have choosen to keep only 5 days.
5 days is enough for developers, but our SRE / managers want more, in order to follow “big” changes and also watch performance from any country.
How can we keep for a long time APM data ?
For long-term analysis, we dont need raw data, but duration percentiles (that is our metric), grouped by some dimensions such as:
- service name
Elasticsearch transform to the rescue
Here a very nice feature from elasticsearch: https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-overview.html that allow to execute in continue aggregations, and store results within an index, like a pivot.
There is also rollup, still in beta, but I have found it more complicated.
So transforms will run hourly, do the aggregation (percentiles + group by) then store results into elasticsearch!
Kibana proposes a nice UI to build the transform
After your 1st transform, you will prefer API anyway 👊
Pretty explicit! After that, you have to start the transform. Wait some hours, then watch the elasticseach index is here with some data, end create the kibana index pattern to start playing!
Dealing with poor pivot groups: aka metric.count = 1
Elasticsearch transform is awesome! Simple to setup and powerful, we love it!
A feature (I found) is missing: a post filter! I dont want in my long-trend index, data with “metric.count = 1” ! because this is not business valuable and will take bytes.
Ingestion pipeline to the rescue
Another great elasticsearch feature is “ingestion pipeline” https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html .
Transform API allow to define a destination index and pipeline! So let’s create a pipeline that drop document with metric.count ≤ 1 :
pretty simple …