The DevOps principle of feedback calls for business, application, and infrastructure telemetry. While telemetry is important for engineers when debugging production issues or setting base operational conditions, it is also important to product owners and business stakeholders because it can reveal poor user engagement that validates product development decisions. High-value business KPI telemetry can even guide the board of directors. Moreover, strong telemetry is a prerequisite to continuous delivery since it promotes safety at high speed.
Organizations have a seemingly infinite list of DevOps telemetry systems to choose from. Although all cloud providers offer something, there are even more third party vendors like DataDog, NewRelic, and AppDynamics. Of course, there are also open source options like Promotheus and Statsd.
But how do you choose between all the options when they overlap? First, take a step. Then, consider the business objectives the telemetry system must deliver.
Outcomes First, Technology Second
The DevOps Handbook discusses telemetry at length. First, focus on business outcomes and the technology choices will follow. The telemetry system should deliver any of the following outcomes:
Lower mean-time-to-resolve via more accurate data.
Decreased change failure rate due to earlier detection in the deployment pipeline.
Increased confidence in operations staff.
Let’s also clarify “telemetry.” Telemetry is any form of diagnostic or operational data about a running system. This includes time series metrics, text-based logs, and events — such an error or a circuit breaker trip. The DevOps Handbook also provides a telemetry collection checklist:
Business logic data: number of sales transactions, revenue, user signups, churn rate, A/B test results, etc.
Application layer data: transaction times, latencies, response codes, and unexpected errors.
Infrastructure layer data: CPU load, memory consumption, disk space, and network bandwidth.
Client/user software level data: application errors, crashes, and user-measured response times.
Deployment pipeline: the pipeline status, lead times, deployment frequencies, number of promotions to the various environments, and their