The Free Energy Principle Demystified
This is an excerpt from the book: The Knowing Universe.
Philosophical and scientific theory attempting to explain existence appears to be coalescing around the variational free-energy principle (FEP). Almost two decades ago, Karl Friston introduced the FEP as a unified neuroscientific theory. These initial papers are now some of the most cited in the field and have inspired thousands of other papers developing his principle and applying it to everything. Great excitement is building from a growing body of evidence hinting that the same general strategy brains use to keep their hosts in existence may, as the principle suggests, be common to all forms of existence.
While this approach has gained tremendous popularity among researchers, it has also baffled many experts. As Wikipedia tells us (Wikipedia):
The free energy principle has been criticized for being very difficult to understand, even for experts.
Although this simple-sounding principle is transforming cognitive neuroscience and is considered by many (myself included) as the most promising approach to a theory of everything, the bafflement it induces in smart people is legendary.
In contrast to its supposed difficulty, I marvel that it makes such clear sense. How could I so easily comprehend this principle, which seems to escape much brighter people? Perhaps the answer is that my unusual intellectual journey has arrived independently at many of the conclusions underlying this principle, and without these distinctive insights, it may well have remain incomprehensible to me. The upside is perhaps that sharing this background may offer some assistance to those struggling to understand the FEP.
We can easily state the principle: everything attempts to minimize the surprise it experiences. It sounds pretty innocuous for a theory of everything; in fact, it has an almost zenlike simplicity. But like a Zen koan, its meaning is elusive. Many papers fail to fully explain the principle before diving into complex mathematics and computer simulations, and some readers are left wondering about the claimed links between surprise and existence.
The ability of all things to experience surprise contains one critical assumption that is rarely explicitly mentioned. The assumption, a restatement of the good regulator theorem (Conant and Ashby), is this: every ‘thing’ contains a model having knowledge for its self-creation and maintenance. This model provides an expected roadmap for existence, and surprise occurs when these expectations are unmet. For many of us, the idea of everything having built-in models that can be surprised is a little hard to accept. But consider that all life has genetic models and complex animals have neural models, and humans have cultural models. Each of these models can be surprised by the evidence. Friston sometimes uses the example of a fish whose genetic and neural models expect it to be in the water and are surprised if it is not. Surprised genetic and neural models are often precursors of death, and surprised cultural models are often precursors of cultural extinction. That is why all things attempt to avoid surprising their models; things that don't trend towards non-existence.
What about physical existence? As we discuss in chapter 8, it turns out that the quantum wave function may form a similar model for quantum existence, but the argument is somewhat more complicated (Friston, 2019), so here we start with more familiar examples.
This connection between models and existence is profound and deserves some explanation. Why should existence require a model? The short answer is that the challenges to existence are formidable, and existence does not occur without following a detailed, knowledgeable model. But we see existence all around us. In what sense is it challenging to achieve? A law of nature, the second law of thermodynamics, summarizes the challenges to existence: disorder increases in all things. If a thing's disorder increases enough, it ceases to be that thing; it becomes non-existent. As we see a little later, the second law and the free energy principle say much the same thing, but while the second law focuses on existence's challenges, the free energy principle focuses on their circumvention through reducing their models' surprise.
But how do things act to minimize surprising evidence? There are two answers: things can accurately follow their models and produce evidence confirming their model predictions, or alternatively they can improve their models to make better predictions. In short entities can either cause reality to conform to their model or cause their model to better conform to reality. The first strategy is easy to comprehend as our genetic, neural, and cultural models predict existence enhancing outcomes and, as a bonus, provide algorithms for achieving those outcomes. Thus this route to minimal surprise only involves following the models as accurately as possible - anything's best strategy for existence is to reduce errors in executing their finely-honed models. The second answer is the evolutionary processes that create and hones more knowledgeable models. This process called inference uses a thing's relative ability to achieve existence as evidence and uses this evidence of existence to update their models' accuracy; think natural selection where evidence generated by the struggle for existence updates the genetic model — the more knowledgeable the model, the fewer surprises it experiences in the world.
This principle's beauty is in its mathematical depth; Friston and colleagues have developed mathematics to approximate surprise experienced in complex, real-world phenomena. Here we only scratch the mathematical surface to reveal a bit of its potential.
We should probably start with the mathematical definition of surprise; it is -ln(p), where p is a probability that some hypothesis is true. How does evidence create this surprise? When sufficient evidence reveals the truth of a particular hypothesis, then -ln(p) is the surprise experienced; if the initial probability assigned to the hypothesis is small but the evidence indicates that the hypothesis is true, there is much surprise.
What does -ln(p) have to do with an entity's model? Models used by real-world things to achieve their existence are probabilistic models. Genetic, neural and cultural models involve a family of competing hypotheses, each of which is assigned a probability that they are the one true hypotheses. For example, at each of an organism's genetic locations or locus, various individuals from the population may have different genetic sequences or alleles. The probability assigned to each specific sequence is its relative frequency within the population, and this probability is the fitness of the sequence. If over many generations a population evolves from having multiple alleles at a locus to having only one, the probability for that sequence is 1, and we might say that the evidence has proven it to be the fittest among the initial family of alleles; it is the one proven to produce the least surprise among the options.
We can consider the hypothesis assigned probability p as one in a mutually exclusive and exhaustive family of hypotheses offering solutions to a real-world existential challenge. Being mutually exclusive and exhaustive has a couple of consequences. The first is that one and only one of the hypotheses must be true within the terms of the model. The second is that the sum of the probabilities over the family of hypotheses must equal 1. If the probabilities add to less than 1, then the hypotheses are not exhaustive; some other possibility exists. If the probabilities add to more than 1, they are not mutually exclusive; the hypotheses have some logical overlap.
Real-world instances simplify these mathematical complexities. For example, the family of alleles at a genetic locus within a population of organisms is naturally mutually exclusive and exhaustive. It is mutually exclusive because each allele is unique, and it is exhaustive because the family consists of all the alleles within the population. Thus the sum of the relative frequencies of alleles in the population must equal 1 as that is implicit in the meaning of relative frequency.
Because the probabilities assigned to the family of hypotheses sum to 1, they form a probability distribution, and a good deal of mathematical machinery is available for analyzing probability distributions. For example, every probability distribution has the property of entropy or the amount of expected surprise: Sum(-p ln(p)). Thus minimizing free energy is equivalent to minimizing model entropy. But the second law of thermodynamics states that the entropy of isolated systems must always increase.
It is in this seeming contradiction that it all comes together. Systems having unconstrained entropy are subject to unconstrained surprise and dissipate into non-existence. An alternative statement of the free energy principle is that existence depends on minimal surprise. Systems only achieve existence if they know how to avoid isolation and exploit outside energy sources to decrease their entropy. And they must accomplish this while following the second law in producing entropy increases in the combined system plus environment. For example, a photosynthetic cell's existence depends on its genetic knowledge for using the sun's energy to counter the second law's tendency towards disintegration; the combined cell-plus-sun system's entropy increases as dictated by the second law and more than pays for the cell's entropy reduction.
Existing systems follow their models' knowledge to navigate the environment and fend off nature's relentless forces towards dissipation. In short, existence is fiendishly tricky; it requires a great deal of knowledge to achieve and must follow that knowledge without errors or surprises. The free-energy principle is important because it is a road map, perhaps nature's only roadmap, for achieving existence, for building better models and for executing them faithfully, and that is why it provides a principled account of all things.
References
Conant, RC and Ashby, RW. Every good regulator of a system must be a model of that system : Int. J. Systems Sci., 1970, Int. J. Systems Sci., pp. 89–97.
Friston Karl A free energy principle for a particular physics [Journal]. - [s.l.] : arXiv:1906.10184 [q-bio.NC], 2019.
Raviv Shaun The Genius Neuroscientist Who Might Hold the Key to True AI [Online] // Wired. - Wired Magazine, November 13, 2018. - https://www.wired.com/story/karl-friston-free-energy-principle-artificial-intelligence/.
Wikipedia Free energy principle [Online] // Wikipedia. - 3 11, 2019. - https://en.wikipedia.org/wiki/Free_energy_principle.