The modern scientific method and good software development practices have a lot in common. The same principles that have made science successful in recent centuries seem to be fundamental to successful software development.
Let me share some examples.
Did a chief architect somewhere ever specify what framework you had to use in your project, a project that he would never even see? And was there no place to challenge the idea?
Discarding authority is a key element in the development of modern science. People used to just follow what Aristotle, Galen, or other great philosophical figures in history said or did. Medical studies often didn’t go any further than studying ancestral heritage. Questioning wasn’t allowed. If you questioned these teachings, it meant that you didn’t fully understand them and you’d have to study more. A culture of following the masters was dominant.
Nullius in verba – nothing on authority – became the motto of the Royal Society, founded in 1660 and played an important role in the success of modern science. We don’t use Newton’s laws of motion today out of respect for the man. We use it because when we repeat the same experiments we get the same results and the theory, again and again, has proper predictions in mechanics (with known limits, of course).
In our software developer community at Aliz, it’s a key cultural foundation that in technical discussions we listen to each other’s reasons, not positions. And this goes beyond technical discussions. Even new junior members can point out problems in our most experienced architect’s plan. Our architects usually have good plans of course, and we usually work with those plans but that’s not because we accept them just because of their authority. It’s because they’re usually right. Usually.
At first, it can feel discomforting that you have to consider anyone’s criticism and accept it if it’s right. It could mean undoing your many hundred lines of code and several days of work. But in the end, it’s liberating. It liberates you from the stress of having to be right all the time. It liberates you from having to live with decisions you could have avoided, wrong decisions. And it makes you write better software.
I find the spirit of nullius in verba also reflected in code reviews. No one can just commit code by saying, “Hey, believe me, we need this quick in production, thx”. It just doesn’t work that way.
Your colleague comes to you saying that they have inspected the performance bottleneck and a certain utility has to be replaced with a newer, more performant one. The best thing you can do at this point is to ask how they came to this conclusion. Exactly what did they measure and investigate and how did they do it? What are the exact numbers. Even if they’ve already replaced the utility and the software does perform better, it’s wise to check if the improved performance is indeed due to the switch and not to some other subsidiary change.
In science, we don’t accept anyone’s word. We replicate experiments to see if they give the same results. Even the best sounding ideas are discarded without proper, replicable experiments to support them. Even the most fundamental things are questioned again and again. It surprised me when I recently learned that the equivalence of inertial and gravitational mass is still being tested in new experiments with astounding precision.
But replicability is not only about experiments, but also about the entire process.
You’ve probably worked with legacy code, a codebase that you suspected was mostly bad when first written and only patched for several years. You don’t know what you can touch, what can be changed. Is this a bug or is this a feature? Does this path ever execute? Is this even a meaningful condition here? Is this still a requirement? Ah, this only works because three bugs actually add up to a feature.
Requirements and execution environments change. Technologies evolve, not to mention the web. Even if all the decisions are right at the time, you’ll probably have to re-evaluate those decisions someday. If you know exactly how and why certain decisions were made, you can confidently re-evaluate them in a changed environment. Framework X was great in Java EE but doesn’t perform well in the new cloud environment where servers are spun up and shut down quickly. Framework Y was great for the customer’s administration screens but now they need a public-facing website.
Replicability applies to how you construct hypotheses and theories. You have to give references to the experiments, theories and practically anything you build your conclusions on. Others won’t just check if they like your conclusion or not. They’ll check if your conclusion can really be drawn from the bases you built on. If any of those bases prove to be wrong later, you can re-evaluate all the other theories that were built on them to see if they’re still correct.
It’s great if these decisions are discoverable from the code. Authoring clean commits with descriptive messages already helps a lot. For more complex changes, the reference to the ticketing system helps find the background of the changes. Reviewers also have to validate if all the changes in the pull request can be derived from the issue the pull request relates to, so that the changes can be traced back to the decisions later.
(Well, this looks like a trendy app name today.) When someone argues that web framework X is the best and they wouldn’t use anything else, or language Y is the best and all the others suck, you know that they’d most likely believe in something specific and that they’re biased by their preferences and habits.
Theories have to be falsifiable. You have to be able to give conditions under which it’s clear to anyone that the theory is not correct. Phlogiston is a good example of this in science, but we’ll come back to this later.
Did you ever make a fix that just made things worse because it turned out you didn’t fully understand the problem?
Falsifiability becomes important when trying to find bugs or trying to tackle issues in application operation. Some bugs are pretty easy to track down. You can have a straightforward logical path across the code to narrow down right to the piece of problematic code. Other bugs are more subtle and many times they’re not clearly caused by certain parts of the code, but rather they’re the result of an unfortunate co-operation of little glitches in many parts of the code.
When solving issues, it’s important to have a mindset that always tries to create falsifiable theories. If you think you know what the problem is, don’t look for other signs that support your theory. Look for signs that would definitely disprove it, if they were true.
Take this scenario. Your car doesn’t start. You suspect that it’s the old battery. Better ask a friend to start your car with their battery before buying a new one. Or what about this one? The application is performing worse on average. You’re pretty sure that it being caused by the new feature just released. Before jumping in, you should check a few other things. Did the performance start to drop before the release? Can the usage of the new feature be attributable to the observed average drop, mathematically speaking? You get the idea.
When trying to optimize a certain type of request, you find some ugly code: There’s no need to copy that array. This could be done in linear time. We compute that value multiple times, oh gosh. Then, when you do some profiling, it turns out that by changing a query, you can speed up the request by 70% and then it will be pretty far from the top of your list of slow requests to optimize. The code that actually physically hurts to look at is only accountable for around 0.5% of the CPU time. Sound familiar? Of course, there are other reasons to fix ugly code, but that’s not the point now.
Measurement is vital. In science, the phlogiston theory was a milestone in this aspect. It posits that a fire-like element called phlogiston is contained within combustible bodies and released during combustion. But it was falsified by experiments that measured the volume and mass of the gases involved in certain reactions. To have a good theory of gravity, it’s not enough to measure how falling objects accelerate; it’s important to measure exactly how they accelerate. The culture of accurate and replicable measurements is just a couple of centuries old but it’s fundamental in modern science. The importance of quantitative measurements is a bit controversial in software development. In the development process itself, most teams struggle with sprint points, complexity estimates, proper controlling of time spent, not to mention code coverage and complexity metrics. And yes, the scientific culture also has its own struggles with P-values and impact factors. But in application operation, measuring many aspects of the application is crucial. We measure response times, error rates, usage rates, log volume, database performance, cache hit rate, and so on. It’s not only important for operational monitoring. Having exact data on how certain features of the application are used can also be used as empirical data when shaping a product’s future. You can’t be fooled by how much you like a certain idea; the measured data might be sobering.
I found these principles to be key elements of our software development process. We regularly ask ourselves these questions and they might also be helpful for you:
Related reading: