Bounded Context Logo

About the author

Karthik Vijay is a rare all-rounder with 20+ years of experience, specializing in Architecture, Engineering, Leadership, and Productivity.

View LinkedIn Profile

Why should you always ask yourself...what (could) go wrong?🤔

TL;DR

When things go wrong, it’s common for us to say “Hindsight is 20/20” or sometimes even go so far as to label it an one-off Black Swan event. Yet, unlike the weather, the vast majority of problems and incidents we encounter on a daily basis are entirely predictable and rarely appear out of nowhere. They usually reflect deep structural issues — fragile processes, short-sighted decisions, and cultures that discourage foresight. Reacting after the fact wastes resources, costs money, and kills momentum. The strongest organisations anticipate risks, prevent failures, and reward those who stop fires before they ever start. #RewardYourRealFirefighters

Examples of things going very wrong

Here are a few extreme examples of ’recent’ events where things went very wrong! Asking this simple question What could go wrong?could have helped prevent some of the disastrous outcomes for these institutions and individuals, or at least reduced their impact.

  1. What could go wrong if your bank invests short-term customer deposits into long-term bonds and the interest rates start rising?Collapse of Silicon Valley BankVideo
  2. What could go wrong if you try to fake it till you make it and you can’t quite .....make it?The Theranos StoryVideo
  3. What could go wrong if (amongst other quality control issues) you simultaneously update hundreds of thousands of windows machines without a staged deployment rollout strategy?The 2024 CrowdStrike outagesVideo
  4. What could go wrong if your culture punishes dissent, rewards compliance, and disregards critical sensor metrics?The Titan Submersible implosionVideo
  5. What could go wrong if an organisation ignores evolving customer needs and clings to old ways of working ?The Collapse of Blockbuster Video

Top 3 reasons why things go wrong

  1. Organisation culture
  2. Lack of foresight
  3. Fragile processes

Organisation culture problems can often be multi-faceted and lie at the core of many other issues, including lack of foresight and fragile processes. If left unaddressed, these issues can potentially lead to:

  1. High turnover rate, especially amongst the top tier talent pool
  2. Slower innovation and speed to market
  3. Paradoxically to the point above - rushing work and getting burnt as a result
  4. People being afraid to speak up
  5. People doing the bare minimum

If these issues become deeply ingrained, a one-off reset won’t be enough to course correct. Bringing in new eyes and fresh perspectives can help, although it will take some time to start seeing progress and even longer to see meaningful results. Remember -if we always do what we always did, we’ll always get what we always got.

What gets in the way of foresight?

Lack of foresight and fragile processes can sometimes come from inexperience in individuals and teams, but more often they reveal deeper structural issues. For example, when organisations constantly reshuffle teams and priorities, there’s little incentive for teams to build robust solutions or apply a “what could go wrong?” mindset. Teams need stability, longevity and more importantly a product mindset.Product mindset vs IT Project mindset

For example, having Team 1 deliver a bare-minimum happy-path feature and then dismantling the team to hand it over to Team 2 in maintenance mode can easily undermine long-term quality. Now, let’s apply some “what could go wrong?” thinking to this way of working.

  1. Team 1 rushes delivery. With a finite timeline, they push for speed, which is often necessary for speed to market, but it leaves little room for robust design or foresight.
  2. Incentives misaligned. Team 1 doesn’t want to look bad handing over a large backlog, so they avoid adding known issues or technical debt, effectively hiding problems.
  3. Knowledge loss. Once Team 1 disbands, valuable context and rationale for design decisions are lost, making it harder for Team 2 to maintain or extend the system.
  4. Team 2 stuck firefighting. Instead of improving or innovating, Team 2 spends their time patching gaps and handling customer pain points they didn’t create.
  5. Morale impact. Team 2 feels like second-class citizens, stuck with cleanup work, which potentially could increase disengagement and contribute to staff turnover.

Preventing fires
(What could go wrong mindset)
vs
Putting out fires
(What went wrong analysis)

Metaphorically speaking, putting out fires is a lot straightforward compared to preventing them. When a fire breaks out, it’s obvious: your monitoring systems flag it, customers notice, it makes the news, and the task is clear - put it out as quickly as possible. While some analysis is necessary to uncover the root cause, with today’s rich logging and monitoring tools, identifying the cause is usually straightforward.

Preventing fires, on the other hand, requires a proactive mindset. It demands foresight and deliberate effort . You need to anticipate what could go wrong, identify potential triggers, and implement mitigation strategies for each type of risk before it ever escalates.

Preventing fires is often far cheaper for an organization than putting them out. Every outage, bug, or incident carries the risk of lost customers, diminished trust, and revenue impact. Continuously reacting to crises also drives up costs through additional manpower, overtime, and after hours support. More importantly, a culture of constant firefighting distracts teams from building new features, innovating, and improving the product. By focusing on prevention, you not only save resources but also create a stable foundation that enables sustainable growth and higher quality outcomes.

Fireproof your Design🔥🚫

Let’s bring this all back to software design and development for a second. Imagine you’re building an API that updates both a third-party service and your database. What could possibly go wrong?

  1. Your api may receive duplicate requests from your clients
  2. The third party api may throttle your request
  3. The third party api could go down for an extended period of time
  4. The third party api could go down temporarily for a second
  5. The third party api could take too long to respond
  6. The third party api may introduce a breaking change without informing you
  7. The third party api update may succeed but your database update may fail or vice versa

Obviously, this is not an exhaustive list of everything that could go wrong for this use case. The objective here is to encourage “what could go wrong” thinking to anticipate potential pitfalls and think proactively about risk. Not all of these problems will require a gold plated solution on Day 1, but being aware of them helps build more resilient and maintainable software, even if you don’t address every risk immediately.

Reward your real Firefighters🏆

The real heroes in an organisation aren’t just the ones who put out fires - they’re the ones who have the foresight to stop them even before they start. For every one of the above examples where things went wrong, there are likely thousands of instances where people prevented bad things from happening like this man who saved the world. Rewarding such people who anticipate risks, design resilient systems, and prevent problems reinforces a culture of foresight and accountability.

In summary, our greatest achievements are often the disasters we never see and the fires that never start — let’s acknowledge and celebrate those heroes who make that possible. #RewardYourRealFirefighters