As promised, I’ve decided to start a new series today, focused on Measurement Driven Design (MeDD). I think this topic is overdue for a good discussion.
You’re probably familiar with the old story of the blind men and the elephant. It is meant to highlight the dangers of incomplete awareness. I believe that this metaphor can help explain why applications perform poorly today.
Good development teams do their best to produce high-performing solutions that are designed and architected for their intended production environments. To ensure a good outcome, they use staging environments to reproduce production-like conditions and in-depth profilers that allow them to use their personal knowledge to isolate and tune their most important business logic. However, staging environments can only approximate production conditions and the tuning tools being used depend on the subject matter expertise of the developer in order to be effective.
In other words, a tool on its own trying to describe a problem in production is like a blind man struggling to describe an elephant. It is stumbling around in the dark doing its best to distinguish important business interactions and root causes from the overall noise. Tool vendors typically try to address this problem by focusing on problematic code (i.e., slow or broken) as well as popular code in the hope that anything problematic and / or popular is important (especially if it is problematic AND popular). However, this means watching everything, even if it is something the customer does not actually care about. Moreover, customers still want insight on their key business functions when they are not problematic or popular. This pushes vendors into configuration-based solutions in which customers are forced to define those key business functions. Unfortunately, this makes the solution more brittle, since it undermines discovery and resilience to change. Moreover, it presumes that customers will be able to analyze code that was written by someone else who is long gone and be able to identify with precision the aspects that represent key business functions. And so we quickly return to gathering more data than we need for fear of missing something important.
There are so many aspects to application performance and delivery outside of poor code design – e.g., inadequate resource provisioning, resource stealing by unrelated external processes, inefficient networking, poor connection speeds, slow client systems, Internet weather, and so on. Individual tools do not capture all of this breadth, so multiple tools are often deployed. Each tool brings in its own data. The greatest challenge is trying to tie these pieces of data together into coherent model of what actually happened. Although some solutions have managed to bridge some parts of the picture in a deterministic fashion, the state of the art still uses a lot of statistical analysis.
The true problem lies in our inability to authoritatively interpret business logic and measure only what matters. Unless we can fix this core problem, the industry will be creating BigData problems where none should exist.
My hope is that we have reached a tipping point – the same tipping point that the industry reached when it realized that quality could not be enforced from the outside by tools. TDD and Agile Testing techniques have gained in popularity in response to the realization that real quality requires a change in how we develop code. Security and performance groups are calling for the same shift to help inject security and performance considerations in the intial conception and design phases of development.
We need to go further. I believe that it is time for developers to start considering the needs of monitoring when they design their software and then build the appropriate data capture techniques right into their code. It is only during development that the business logic is clear and we have the opportunity to maximize the efficiency of capturing only the right measurements – i.e., those that matter most to the Line of Business owner and the Operations team. Tool vendors would be liberated to focus on managing and analyzing the data more effectively instead of worrying about how it can be captured. Data capture would become a commodity.
In my next several articles in this series, I will explore what needs to be done to achieve this goal in the industry.