Principles for Antifragile Design
Moving from “Surviving” to “Thriving”
Most organisations are designed for stability. They optimise for a known world, stripping out redundancy to maximise efficiency. The problem, as we discussed in the last edition, is that the environment they operate in today is more chaotic than ever. What used to be a VUCA world is now more brittle (fragile), anxious, non-linear, and incomprehensible. While there are different acronyms used to describe our environment, VUCA, TUNA, BANI, this article focuses on what leaders need to think about to ensure that their organisations can not just survive, but thrive in such environments.
In Part 1, we explored the concept of antifragility (systems that gain capability from stress) using the Titanic and the human immune system as analogies. But a critical question remains: how do we actually build such systems?
Before we move to the principles, I want to address a thoughtful comment I received from a reader on LinkedIn who works in maritime regulation:
“The article [part 1] highlights an interesting paradox: the maritime system became safer through reactive reform after catastrophe, not designed-in antifragility… How do regulatory frameworks enable or constrain antifragility?”
This comment strikes at the heart of the practitioner’s dilemma, and it deserves a careful answer.
The reader is right; the maritime system, which I highlighted in Part 1, improved via reactive learning after the Titanic sank (learning from failure). But the deeper insight is that in safety-critical domains, we cannot rely on “real-world” trial-and-error because the cost of failure is existential. You cannot sink a ship to learn how to build a better one. You cannot crash an aircraft to refine flight dynamics.
This is where the distinction between learning from catastrophe and learning from design becomes critical.
In high-reliability organisations (including aircraft carrier decks, nuclear plants, and trauma centres), teams practice what researchers call “trials without errors¹.” They simulate stress through drills and near-miss reporting. They learn without paying the price in lives.
Antifragility in a business context does not mean betting the company on wild experiments. It means designing a portfolio with a robust core and antifragile edges, being able to exploit and explore, and actively seeking out stressors that make your business stronger.
The maritime system improved after the catastrophe because it forced investment in new safety processes while maintaining proven operational practices. A catastrophe-driven change is expensive, but a designed antifragile system achieves the same result proactively.
Analysis of high-performing companies suggests that they allocate approximately 70% to core improvements, 20% to adjacent opportunities, and 10% to transformational initiatives². This portfolio approach is supported by evidence on organisational ambidexterity: firms that balance exploitation (efficiency in core operations) with exploration (experimentation at the edges) significantly outperform those that do only one of the two³. These experimental edges become antifragile when they’re designed to fail fast, cheaply, and informatively; each failure teaches the organisation without threatening the core. More on this shortly.
The key is many small, reversible experiments that enable data-driven, decisive action. Empirical evidence from 35,000 startups shows this approach accelerates learning and performance; A/B testing enables startups to introduce new products 9–18% faster, rapidly scaling winners while abandoning underperforming initiatives⁴.
This design separates robust core operations from experimental edges. In portfolio terms (the 70% core, 20% adjacent, 10% transformational allocation noted above), this model reflects real options theory; the core operations generate predictable returns, while the experimental portfolio functions as embedded real options that preserve strategic flexibility by limiting downside losses if experiments fail, while capturing upside potential if they succeed. This enables organisations to balance immediate operational performance with long-term adaptive capability⁵
Before I continue, it is important to acknowledge the fundamental limitation that the research landscape on antifragile organisational design is conceptually rich but empirically thin.
There is substantial, validated research on the component principles I discuss below: psychological safety, organisational ambidexterity, high-reliability organising, slack resources, and learning from small failures. Each has decades of empirical support linking it to organisational performance. However, research that explicitly demonstrates how these principles, when combined as an integrated system, produce measurable antifragile outcomes (capability gain from stress, not just resilience) remains limited. Most empirical research stops at resilience. But as we established in Part 1, resilience and antifragility are not the same thing.
In the principles that follow, I share five fundamental truths that are grounded in a synthesis of organisational research, Taleb’s concept of antifragility⁶, and Edzo Botjes’ work on its application to organisational design⁷ (from Part 1), distilled into actionable principles to ponder. They highlight how organisations learn, adapt, and perform in the face of uncertainty. Consider these as design heuristics; evidence-informed guidelines that require adaptation to your context, not rigid prescriptions guaranteed to produce antifragility. Whether these principles, when enacted together, create true antifragility (gain from disorder) versus superior resilience (robust recovery) remains an empirical question that practitioners and researchers must explore together.
Five Principles for Antifragile Design
Principle 1 - Instruct Learning, Not Just Performance
Picture a pharmaceutical company discovering a fatal flaw in a drug three months before launch. A manufacturing defect. An engineer spotted it during a routine test and reported it immediately. The natural response would be to celebrate the catch, investigate the root cause, fix it, and document the lesson.
Now, picture the same company with a different culture. The engineer sees the same flaw. But the manufacturing division is under pressure to hit timeline targets. The engineer’s manager has a bonus tied to “no delays.” The engineer knows that reporting the issue will trigger a three-month investigation. So they do not report it. The drug launches. Two years later, regulators discover the flaw. The company faces fines, reputational damage, and lawsuits. The flaw could have been caught; it was not because the incentive structure punished bad news. This is the opposite of antifragile. The organisation is teaching its people, the ones closest to the truth, to hide information. Put another way, it is converting weak signals into silence.
Research on team learning found that when members believe their team is psychologically safe (i.e., they trust that admitting errors or asking for help will not damage their reputation), they engage in what researchers call “learning behaviour”: seeking feedback, discussing errors, and proposing experiments⁸. And critically, teams with high psychological safety perform better, not worse, than those with high “compliance culture.” The effect flows through a simple mechanism: safety leads to speaking up, which, in turn, enables early problem detection, which, in turn, leads to better outcomes. The mechanism is straightforward. If you hide errors, you get unpleasant surprises. If you surface errors early, you get problems you can fix.
You can operationalise this by decoupling error reporting from performance evaluation. If your performance review system penalises the person who discovers a flaw, or the person who made the mistake, you have built a system that prizes ignorance. Instead, institute blameless post-mortems that focus on systemic causes rather than individual culpability. Amazon institutionalises this through their Correction of Error (COE) process⁹, a blameless post-incident analysis framework that focuses on systemic causes rather than individual fault. The process explicitly asks, ‘Why did our systems allow it?’ rather than ‘Who did it?’ In practice, this philosophy means focusing on documenting systems and processes, identifying how teams and organisational structures enabled the problem, rather than on individual accountability. This protects the engineer while scrutinising the process, creating a culture where surfacing errors transparently enables the entire organisation to build stronger systems by learning from individual mistakes.
Finally, make the invisible visible by measuring “near-miss reporting” as a leading indicator of safety culture. In aviation, airlines measure this religiously. A sharp drop in reported near-misses is treated as a warning sign, not a success (it suggests people are hiding problems again). A unit reporting zero near-misses is not safer; it is blind.
Principle 2 - Embrace Chaos and The Strategy of Small Losses
Antifragile systems gain from disorder and require stressors⁶. Your immune system does not become stronger without encountering threats. Your team does not become more adaptive without facing challenges. But, as we noted in Part 1 with the immune system, dose matters. Existential stress kills the system, but bounded stress strengthens.
The “strategy of small losses,” articulated by Sitkin¹⁰, emphasises deliberately conducting modest-scale, thoughtfully planned experiments with uncertain outcomes to foster organisational learning. Small failures teach organisations how their systems break by exposing weak links when the stakes are low. This creates what Sitkin calls “learning readiness,” a recognition of risk and motivation for change, which keeps organisations vigilant rather than complacent¹⁰
Netflix exemplifies this through “Chaos Monkey,” a tool that randomly disables servers in production. Why deliberately break your system? Because if you never test your resilience, you may discover it fails catastrophically when a real outage hits. By injecting controlled chaos, Netflix’s engineers build systems that degrade gracefully rather than collapsing entirely. This is engineering for resilience through controlled stress; the system learns to withstand failure, not necessarily gain new capabilities from it, but the practice demonstrates the principle of bounded experimentation.
The same logic applies to strategy. A startup that launches ten small pilots, expecting six to fail, is safer than one that bets everything on a single “perfect” product. The failed pilots cost less and teach faster. The winners fund themselves. The portfolio survives through diversity, not perfection. This is not recklessness. It is bounded experimentation. The losses are capped, but the learning is unbounded.
You can start this bounded experimentation through practices like Chaos Engineering. If you operate critical systems, regularly run “what-if-this-fails” scenarios. Can you survive a key supplier exit? A 40% talent loss in engineering? A regulatory crackdown on your core market? Document the answers and fix the vulnerabilities.
You could also implement Red Teaming, where an independent team simulates adversarial attacks on your organisation to test whether your cybersecurity, crisis response, or strategic assumptions can withstand hostile pressure.
For new initiatives, use reversible experiments rather than betting on one massive transformation. Run many small pilots with explicit kill criteria and rapid iteration. Most value comes from discovering what NOT to do.
Amazon’s “two-way door” decision framework operationalises this principle: reversible decisions with limited consequences that can be easily undone, enabling teams to “act with only about 70% of the data” rather than waiting for certainty. This approach recognises that rapid, reversible experiments that limit downside while preserving upside are how modern organisations drive innovation.
Principle 3 - Build Variety Where It Matters
Imagine two hospitals. Hospital A has standardised everything: one supplier for all surgical instruments, one protocol for all procedures, one training curriculum for all nurses. It is efficient. Costs are down 15%. Hospital B is messier. They have three suppliers for critical instruments (which are more expensive). They allow surgeons discretion in which protocol to follow, depending on patient complexity. Nurses get training in multiple specialities. It costs 8% more overall. But they have a range of options (requisite variety)
During a pandemic, supplier A gets hit. A factory closes. Hospital A cannot operate. All their instruments come from a single source that is now overwhelmed or shut down. Hospital B seamlessly switches to Supplier B. Their nurses, cross-trained in multiple areas, fill gaps when intensive care overflows into other units. Their surgeons, used to adapting protocols, manage novel patient presentations.
Hospital A is more cost-efficient in terms of stability, while Hospital B is antifragile.
This principle is grounded in W. Ross Ashby’s foundational law of requisite variety, which holds that a control system must possess equal or greater complexity than the system being controlled. Ashby proved that “only variety can destroy variety¹¹.” Applied to the pandemic example above, we see that Hospital B’s internal organisational variety, i.e., multiple suppliers, cross-trained staff, flexible protocols, enabled it to absorb disturbances that overwhelmed Hospital A’s homogeneous structure.
The strategic question is not whether to increase variety, but where. You want options where it matters (suppliers, critical skills, decision pathways). Conversely, you want standardisation where it matters (compliance protocols, safety interfaces, core governance).
Applying this involves designing a modular architecture. Build your systems where components can be swapped or updated without redesigning the whole. A monolithic system is efficient until it breaks; a modular system trades initial complexity for resilience but is vastly more adaptable.
It also requires cognitive diversity. Hire for disagreement but bias for action; diversity creates value only when teams can convert debate into timely decisions. A leadership team of people who all think similarly will all be blindsided by the same black swan. Diversity of background, function, experience, and worldview increases the variety of scenarios your organisation can see and respond to.
Finally, reframe “redundancy” as strategic optionality, not waste, but the price of preserving future choices. Having a second supplier costs more today. But it is the price of the option to survive if Supplier 1 fails. In volatile times, options are precious. The same logic applies to skills, partnerships, and revenue streams.
Principle 4 - Decentralise Sensing and Decision Rights
In a stable world, information flows predictably upward to decision-makers. The top makes the call, and the organisation executes. This works if the environment changes slowly and evenly. In a more chaotic world, the organisation’s frontline sees threats first, as we already established. Your frontline customer success team knows your product is degrading before metrics show it. Your plant operator sees the equipment behaving oddly before an alarm fires. Your engineer spots the anomaly before it becomes a scandal.
If these people have to ask permission to act, you will respond too late.
Research into organisations operating under extreme uncertainty (such as aircraft carriers, nuclear power plants, and emergency departments) reveals a striking pattern. During a crisis, authority does not follow the org chart. It migrates to the person with the most expertise on the specific problem, regardless of rank. High-reliability organisations structure themselves so that those who know what to do in a specific situation can take the lead, rather than hewing to a set hierarchy. The concept is called “deference to expertise¹².” It is the opposite of “command and control.” It is “intent and discretion” (leaders communicate the outcome that needs to happen and the constraints that apply; teams decide how to achieve it). The result is often agility in times of crisis.
To achieve this agility, you must push decision rights to the edge. Define a set of guardrails (spend limits, duration limits, risk tolerances) and give frontline teams authority to act within them without escalation. This requires trust, but it also removes delays. When your support team can spend up to €1,000 to retain a customer without approval, they respond in minutes, not weeks.
This works nicely when coupled with what military doctrine calls Commander’s Intent. When your CEO communicates strategy, they should answer three questions: What needs to happen? Why is it important? What constraints apply? Then trust teams to figure out how. It creates ownership and adaptation as teams learn what works in their local context.
The structural change here is to design more escalation paths rather than approval paths. The default should be “go ahead”; escalation happens only if you hit a guardrail. This flips the incentive from “ask permission (and hide risks)” to “act and flag problems.” High-reliability manufacturing plants often track “time from signal detection to decision” (the time it takes frontline signals to reach decision-makers). In high-reliability operations, this is measured in hours or minutes, not days.
Principle 5 - Hold Governed Slack
For a long time, “lean” has been the gospel. Strip out waste. Optimise every process. Utilise every asset to 95%+. If your team is not busy, you are leaving money on the table.
This makes sense when the environment is stable. But in a more chaotic environment, “lean” means “brittle.” There is no cushion to absorb a shock. There is no capacity to experiment. There is no bandwidth to respond to an opportunity.
A meta-analysis of 66 studies examining the relationship between slack and firm performance found a positive relationship across all three slack types the researchers identified (available, recoverable, and potential slack)¹³. Slack is what allows you to explore things. Without slack, every hour is allocated to today’s work. You cannot explore; you can only exploit.
An important caveat, though, is that the relationship between slack and performance is contingent. It works best in dynamic environments. In stable, less dynamic industries, excessive slack can reduce efficiency without creating corresponding adaptive gains. The key is governed slack (strategic reserves you deliberately protect, not passive inefficiency).
This is where organisational ambidexterity becomes practical. As stated in the introduction, research on firms that balance exploration (innovation, new markets, new capabilities) with exploitation (efficiency, known markets, proven operations) shows that they outperform those that do only one of the two³. The ones that do both are “ambidextrous.” But you cannot be ambidextrous if you are utilised at 100%. Hence, you need slack to explore.
When I was working on a digital transformation initiative at a multinational telco in 2018, we structured it explicitly: one team focused on exploring new revenue streams and digital services (measured by learning velocity and new capability-building), and separate operational teams focused on the core operational backbone (measured by efficiency and reliability). Exploration and exploitation require different metrics, different time horizons, and different team compositions.
To make this work, you must explicitly reserve capacity. While Google has since evolved its policy, its famous “20% time” was an example of governed slack. Build reserved capacity into your planning. Protect it in your budget. Do not let operational urgency consume it.
It also requires a commitment to true ambidexterity, not “innovation theatre.” You cannot meaningfully explore while running core operations at 100% utilisation. You need different teams (or at least separate project time), different metrics, and different time horizons. Exploration is measured by learning velocity and capability gains; exploitation is measured by efficiency and reliability. Both matter. Neither should dominate.
On Regulation: From Risk-Based to Uncertainty-Aware
To return to my reader’s question: “How do regulatory frameworks enable or constrain antifragility?”
I believe it hinges on a distinction between two concepts: Risk and Knightian Uncertainty¹⁴.
Risk is when you know (or can estimate) the odds: a casino’s roulette wheel, an insurer’s actuarial model, an engineer’s failure rate calculation. These domains work well with prescriptive regulation: “lifeboats must accommodate 100% of passengers; radio watches must be 24/7; Pharmaceutical Good Manufacturing Practice (GMP) standards require X, Y, Z.” The hazards are known, the solutions proven.
Uncertainty (in the Knightian sense) is when outcomes are imaginable, but probabilities cannot be assigned. Which geopolitical event will disrupt your supply chain? Which AI interaction will behave unexpectedly? The future is not just unknown; it is unknowable probabilistically.
Prescriptive regulation fails in the face of uncertainty because it ossifies systems around yesterday’s best practices. Goals-based regulation (setting outcomes like “your AI system must be explainable,” but leaving means open) allows regulated entities to innovate in compliance¹⁵. This creates variety within the regulatory ecosystem where different firms experiment with solutions, surfacing innovations that inform evolving best practices.
The EU AI Act uses this approach for high-risk AI systems (goals-based for emerging challenges, prescriptive bans for unacceptable risks). Pharmaceutical GMP uses prescriptive rules for well-understood hazards where failure modes are known and catastrophic.
The Antifragile Design Checklist
We are a week into 2026, and as you review your strategy for the year, ask these five questions:
On Learning: Does “bad news” travel faster than “good news” in our organisation? Are we measuring near-miss reporting and psychological safety, or just output metrics?
On Stress: Are we protecting our system from all stress, or are we deliberately injecting safe-to-fail stressors to learn? Do we run chaos engineering, red team exercises, or similar processes?
On Variety: Do we have a portfolio of options (multiple suppliers, modular systems, diverse skills), or are we betting on a single “efficient” path?
On Decentralisation and Decision Rights: Who has the authority to stop the line if they see a problem? Can a frontline person act within guardrails, or do they need approval from three levels up?
On Slack: Do we have the capacity (cash, talent, time) to pounce on a sudden opportunity, or are we running at 100% utilisation? Can we explore, or only exploit?
Applying antifragility comes at a cost; efficiency in the short term to buy survival and capability in the long term. But as the Titanic story taught us, and as every resilient organisation has learned, the cost of building in options is far lower than the cost of discovering too late that you have none.
These five principles are grounded in decades of validated research on how organisations learn, adapt, and perform under uncertainty. Whether they combine to create true antifragility (gain from disorder) or superior resilience (robust recovery and adaptation) will depend on how you apply them in your context and on the evidence you gather as you experiment.
I hope you enjoyed this two-part series on applying antifragility. Thank you for reading, and thank you to my reader in maritime regulation for pushing the thinking deeper. This is how we learn.
References
LaPorte, T. R., & Consolini, P. M. (1991). Working in Practice But Not in Theory: Theoretical Challenges of “High-Reliability Organizations.” Journal of Public Administration Research and Theory, Vol. 1, No. 1, pp. 19–47.
Nagji, B., & Tuff, G. (2012). Managing Your Innovation Portfolio. Harvard Business Review, 90(5), 66–74. https://hbr.org/2012/05/managing-your-innovation-portfolio
He, Z. L., & Wong, P. K. (2004). Exploration vs. Exploitation: An Empirical Test of the Ambidexterity Hypothesis. Organization Science, 15(4), 481–494.
Koning, R., Hasan, S., & Chatterji, A. (2019). Experimentation and Startup Performance: Evidence from A/B Testing. NBER Working Paper No. 26278. (Published in Management Science, 68(9), 6434–6453, 2022.)
Trigeorgis, L., & Reuer, J. J. (2017). Real options theory in strategic management. Strategic Management Journal, 38(1), 42–63.
Taleb, N. N. (2012). Antifragile: Things That Gain from Disorder. Random House.
Botjes, E. A. (2020). Defining Antifragility and the application on Organisation Design. Antwerp Management School. https://zenodo.org/records/3719389
Edmondson, A. (1999). Psychological Safety and Learning Behavior in Work Teams. Administrative Science Quarterly, Vol. 44, No. 2, pp. 350–383.
AWS Blog. (2022). Why You Should Develop a Correction of Error (COE). Retrieved from https://aws.amazon.com/blogs/mt/why-you-should-develop-a-correction-of-error-coe/
Sitkin, S. B. (1992). Learning through failure: The strategy of small losses. Research in Organizational Behavior, Vol. 14, pp. 231–266.
Ashby, W. R. (1956). An Introduction to Cybernetics. Chapman & Hall.
Weick, K. E., & Sutcliffe, K. M. (2001). Managing the Unexpected: Assuring High Performance in an Age of Complexity. Jossey-Bass.
Daniel, F., Lohrke, F. T., Fornaciari, C. J., & Turner, R. A. (2004). Slack resources and firm performance: a meta-analysis. Journal of Business Research, Vol. 57, No. 6, pp. 565–574.
Braun, C., et al. (2024). Knightian uncertainty in the regulatory context. Behavioural Public Policy. https://doi.org/10.1017/bpp.2024.59
Decker, C. (2018). Goals-based and rules-based approaches to regulation. BEIS Research Paper Number 8, Department for Business, Energy and Industrial Strategy.


