This is the first in a series of post covering Qcon London 2018. The codecentric NL team spent three days at Qcon to catch up on the latest trends and practices in software development, devops, architecture and distributed systems.
Introduction to Qcon
Technology is moving faster all the time and at codecentric we look for different ways to keep up with what’s hot and what’s not. One of the recurring events on our calendar is Qcon London, which brings together a host of international attendees and has industry-renowned speakers from all over the globe. The conference focusses on practioners and packs a big punch with high-quality content condensed in three days.
The talks are organized in tracks, where each track is hosted by an industry veteran with specific knowledge of the track they are hosting. This helps shape the conversations and questions around the talks.
The Qcon team curates content around topics that are emergent or have just about gained traction within the industry, which makes it a perfect place to learn about new technology trends. And with the focus on practice rather than theory, it’s a great way to learn from those who have led the way in trying new technologies and who have started to develop practices around those.
The topics covered in this years’ conference, grouped by their respective place on the Rogers adoption curve can be found below:
The rise of Observability
One of the key topics during this year’s event was Observability. It is clear that the growing popularity of distributed architectures and microservices means that understanding what is happening within the IT landscape is becoming increasingly difficult. The good old days of having Nagios alert on CPU spikes and disks filling up are over. They are simply not enough to deal with complex and unpredictable failures across todays distributed platforms. Because the only certainty is that failure is unavoidable.
But what’s wrong our monitoring?
As Charity Majors pointed out in her talk Observability and Emerging Infrastructures, monitoring IT systems has been mostly focussed on getting to grips with the ‘known unknowns’: that is to say, we monitor the things that we know are prone to cause issues, or that are prone to fail. For example, because we know high loads on a machine could trigger faults, we put a monitor on it and let the system alert us if a certain threshold is crossed. This has served us well for a long time and still definitely serves its purpose, but it is no longer enough.
As we learn more about our systems and its behaviours, our engineers are making sure that appropriate actions are taken to avoid customer impacting issues. Auto remediation of problems can be implemented, such as auto-scaling resources to deal with contention. If our system is built to deal with failure, sending certain alerts may no longer be relevant since the system can cope with the problem without impacting our customers. And if the customers are not impacted, why should we page our engineers at 03:00 AM?
Monitoring is how we have kept an eye on the things the we know will fail in our systems for a long time. But times (and our architectures) have changed dramatically.
Operational complexity and the need for observable systems
With microservice architectures and the rise of cloud-native patterns our workloads and the platforms supporting them, are becoming increasingly short-lived and change heavy. We are seeing companies that are deploying new code into production at staggering rates. And while those deployments are often small, incremental changes, we know that change is often the cause of incidents.
With so many moving parts within our systems it is becoming increasingly hard to find the cause of problems. Observability of a system, as Majors states, is what should allow you to answer arbitrary questions about how well your software works at any time. This is quite different from the static view which our traditional monitoring systems present us with. In the end, what matters most is if we are serving our customers. With these new architectures, comes a shift in monitoring. A focus on the health of the business transactions flowing though the systems and the experience of the customers interacting with it. The health of just the individual components becomes a lesser concern.
Building observable systems requires logging, traces and metrics to be generated, collected and available for analysis. It the only way in which we will be able to find the problems when we need to.
Qcon 2018 showed an increase in attention on the topic of monitoring of distributed systems. With evolving architectures, we can clearly see that infrastructural components in our applications landscape need to become increasingly intelligent. There is an obvious need for tooling to make tangible information out of the massive set of data that is collected through traces, metrics and logs.