Increasingly, when we talk about software systems we actually mean distributed software systems. They live on the network, in the cloud. They are not standalone.
Distributed software systems quickly scale to a point where it gets complex to manage them or know what’s going on inside.
How do we gain more insight and ensure reliable software?
The key word is ‘Observability’.
Observability is built upon the three pillars of Observability: Metrics, Traces and Logs.
But what does THAT mean?
Essentially three layers of data interpretation, some of it labelled, some of it unstructured.
Metrics – tagged with metadata, measures which enable an easy way of determining overall health and performance of the system.
Traces – giving more detailed indication of performance and functional insights into the application. How users are using it and how it’s responding.
Logs – analysis of the raw logging information that comes from all parts of the systems and application. Used for remediation and failure recovery.
Observability platforms provide a way of ingesting and visualising data. Support teams and SREs (Site Reliability Engineers) can receive alerts as well as respond to incidents.
All production-quality (i.e. end user) software needs good metrics and logging in order for it to be fully supportable. The only way (well, IMO the best way) you can achieve reliable software is by gathering feedback – through observability.
When it comes to building software, don’t leave monitoring and logging as an afterthought. It’s an essential part of the feedback process of making your application more fit for purpose and better at fulfilling your customer’s needs.