Getting Out of the Dark Room – Staying Curious:

In today’s post I am looking at the importance of staying curious in the light of Karl Friston’s “Free Energy Principle” (FEP) and Ross Ashby’s ideas on indirect regulation. I have discussed Free Energy Principle here. The FEP basically states that in order to resist the natural tendency to disorder, adaptive agents must minimize surprise.

Karl Friston, the brilliant mind behind FEP noted:

the whole point of the free-energy principle is to unify all adaptive autopoietic and self-organizing behavior under one simple imperative; avoid surprises and you will last longer.

Avoiding surprises means that one has to model and anticipate a changing and itinerant world. This implies that the models used to quantify surprise must themselves embody itinerant wandering through sensory states (because they have been selected by exposure to an inconstant world): Under the free-energy principle, the agent will become an optimal (if approximate) model of its environment. This is because, mathematically, surprise is also the negative log-evidence for the model entailed by the agent. This means minimizing surprise maximizes the evidence for the agent (model). Put simply, the agent becomes a model of the environment in which it is immersed. This is exactly consistent with the Good Regulator theorem of Conant and Ashby (1970). This theorem, which is central to cybernetics, states that “every Good Regulator of a system must be a model of that system.” .. Like adaptive fitness, the free-energy formulation is not a mechanism or magic recipe for life; it is just a characterization of biological systems that exist. In fact, adaptive fitness and (negative) free energy are considered by some to be the same thing.

This idea of the agent having a model of its environment is quite important in Cybernetics. In fact, the idea of FEP can be traced back to Ashby’s ideas on Cybernetics. For an organism to survive, it needs to keep certain internal variables such as blood pressure, internal temperature etc. in a certain range. Ashby called these as essential variables, depicted by “E”. Ashby noted that the goal of regulation is to keep these essential variables in range, in the light of disturbances coming from the environment. In other words, the goal of regulation is to minimize the effect of disturbances coming in. A perfect regulation will result in no disturbances reaching the essential variables. The organism will be completely ignorant of what is going on outside in this case. When the regulation succeeds, we say that the regulator has requisite variety. It is able to counter the variety coming in from the environment. Ashby called this “the law of Requisite Variety”, and explained it succinctly as “only variety can absorb variety.” Ashby explained the direct and indirect regulation as follows:

Direct and indirect regulation occur as follows. Suppose an essential variable X has to be kept between limits x’ and x”. Whatever acts directly on X to keep it within the limits is regulating directly. It may happen, however, that there is a mechanism M available that affects X, and that will act as a regulator to keep X within the limits x’ and x” provided that a certain parameter P (parameter to M) is kept within the limits p’ and p”. If, now, any selective agent acts on P so as to keep it between p’ and p”, the end result, after M has acted, will be that X is kept between x’ and x”.

Now, in general, the quantities of regulation required to keep P in p’ and p” and to keep X in x’ to x” are independent. The law of requisite variety does not link them. Thus, it may happen that a small amount of regulation supplied to P may result in a much larger amount of regulation being shown by X.

When the regulation is direct, the amount of regulation that can be shown by X is absolutely limited to what can be supplied to it (by the law of requisite variety); when it is indirect, however, more regulation may be shown by X than is supplied to P. Indirect regulation thus permits the possibility of amplifying the amount of regulation; hence its importance.

Ashby explained the direct and indirect regulation with the following example:

Living organisms came across this possibility eons ago, for the gene-pattern is a channel of communication from parent to offspring: ‘Grow a pair of eyes,’ it says, ‘ they’ll probably come in useful; and better put hemoglobin into your veins — carbon monoxide is rare and oxygen common.’ As a channel of communication, it has a definite, finite capacity, Q say. If this capacity is used directly, then, by the law of requisite variety, the amount of regulation that the organism can use as defense against the environment cannot exceed Q. To this limit, the non-learning organisms must conform. If, however, the regulation is done indirectly, then the quantity Q, used appropriately, may enable the organism to achieve, against its environment, an amount of regulation much greater than Q. Thus, the learning organisms are no longer restricted by the limit.

A lower cognitive capacity organism may be able to survive with just relying on its gene-pattern, while a higher cognitive capacity organism has to supplement the basic gene-patterns with a learning behavior. In order to do this, it has to learn from its environment. Ashby continued:

In the same way the gene-pattern, when it determines the growth of a learning animal, expends part of its resources in forming a brain that is adapted not only by details in the gene-pattern but also by details in the environment… dictionary. While the hunting wasp, as it attacks its prey, is guided in detail by its genetic inheritance, the kitten is taught how to catch mice by the mice themselves. Thus, in the learning organism the information that comes to it by the gene-pattern is much supplemented by information supplied by the environment; so, the total adaptation possible, after learning, can exceed the quantity transmitted directly through the gene-pattern.

It is important to note that the environment does not input information into the organism. Instead, the organism perceives the environment through its action on the environment. The environment also acts on the organism, just like the organism acts on the environment. Perception is possible only through this circular causal cycle. As Ashby noted, the gene pattern for learning allows for the organism to model its environment, and this allows for the indirect regulation. Ashby explains this point further:

This is the learning mechanism. Its peculiarity is that the gene-pattern delegates part of its control over the organism to the environment. Thus, it does not specify in detail how a kitten shall catch a mouse, but provides a learning mechanism and a tendency to play, so that it is the mouse which teaches the kitten the finer points of how to catch mice. This is regulation, or adaptation, by the indirect method. The gene-pattern does not, as it were, dictate, but puts the kitten into the way of being able to form its own adaptation, guided in detail by the environment.

The Dark Room:

At this point, we can look at the idea of the dark room. This is a thought experiment in FEP. We can try to explain this also using Ashby’s ideas. If the goal of the regulator is to minimize the impact of disturbances on the essential variables, one strategy is to then go to an environment with minimum disturbances. In FEP, this thought experiment is explained similarly as – if the goal of the agent is to minimize surprise, why wouldn’t the agent find a dark room and stay in it indefinitely?

A recurrent puzzle raised by critics of these models (FEP) is that biological systems do not seem to avoid surprises. We do not simply seek a dark, unchanging chamber, and stay there. This is the “Dark-Room Problem.” 

Karl Friston offers an answer to this question:

Technically, the resolution of the Dark-Room Problem rests on the fact that average surprise or entropy H(s|m) is a function of sensations and the agent (model) predicting them. Conversely, the entropy H(s) minimized in dark rooms is only a function of sensory information. The distinction is crucial and reflects the fact that surprise only exists in relation to model-based expectations. The free-energy principle says that we harvest sensory signals that we can predict (cf., emulation theory; Grush, 2004); ensuring we keep to well-trodden paths in the space of all the physical and physiological variables that underwrite our existence. In this sense, every organism (from viruses to vegans) can be regarded as a model of its econiche, which has been optimized to predict and sample from that econiche. Interestingly, free energy is used explicitly for model optimization in statistics (e.g., Yedidia et al., 2005) using exactly the same principles.

This means that a dark room will afford low levels of surprise if, and only if, the agent has been optimized by evolution (or neurodevelopment) to predict and inhabit it. Agents that predict rich stimulating environments will find the “dark room” surprising and will leave at the earliest opportunity. This would be a bit like arriving at the football match and finding the ground empty. Although the ambient sensory signals will have low entropy in the absence of any expectations (model), you will be surprised until you find a rational explanation or a new model (like turning up a day early). Notice that average surprise depends on, and only on, sensations and the model used to explain them. This means an agent can compare the surprise under different models and select the best model; thereby eluding any “circular explanation” for the sensations at hand.

We are born with a gene pattern that allows for learning. The basic pattern is to learn, and our survival mainly comes from this. We are able to get out of the dark room because of this. We are born curious and this allows us to keep on learning. We have an inner ability to keep looking for answers and not be satisfied with status quo.

I am sure there is an important lesson for us all here with the idea of the dark room and the indirect regulation. I could simply say – stay curious and keep on learning. Or I can have you come to that conclusion on your own. As famous Spanish philosopher, José Ortega y Gasset noted – He who wants to teach a truth should place us in the position to discover it ourselves.

I will finish with a great lesson from Ashby to explain the idea of the indirect regulation:

If a child wanted to discover the meanings of English words, and his father had only ten minutes available for instruction, the father would have two possible modes of action. One is to use the ten minutes in telling the child the meanings of as many words as can be described in that time. Clearly there is a limit to the number of words that can be so explained. This is the direct method. The indirect method is for the father to spend the ten minutes showing the child how to use a dictionary. At the end of the ten minutes the child is, in one sense, no better off; for not a single word has been added to his vocabulary. Nevertheless, the second method has a fundamental advantage; for in the future the number of words that the child can understand is no longer bounded by the limit imposed by the ten minutes. The reason is that if the information about meanings has to come through the father directly, it is limited to ten-minutes’ worth; in the indirect method the information comes partly through the father and partly through another channel (the dictionary) that the father’s ten-minute act has made available.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was The Cybernetics of Ohno’s Production System:

The Cybernetics of Ohno’s Production System:

In today’s post, I am looking at the cybernetics of Ohno’s Production System. For this I will start with the ideas of ultrastability from one of the pioneers of Cybernetics, Ross Ashby. It should be noted that I am definitely inspired by Ashby’s ideas and thus may take some liberty with them.

Ashby defined a system as a collection of variables chosen by an observer. “Ultrastability” can be defined as the ability of a system to change its internal organization or structure in response to environmental conditions that threaten to disturb a desired behavior or value of an essential variable (Klaus Krippendorff). Ashby identified that when a system is in a state of stability (equilibrium), and when disturbed by the environment, it is able to get back to the state of equilibrium. This is the feature of an ultrastable system. Let’s look at the example of an organism and its environment. The organism is able to survive or stay viable by making sure that certain variables, such as internal temperature, blood pressure etc. stays in a specific range. Ashby referred to these variables as essential variables. When the essential variables go outside a specific range, the viability of the organism is compromised. Ashby noted:

That an animal should remain ‘alive’, certain variables must remain without certain ‘physiological’ limits. What these variables are, and what the limits, are fixed when the species is fixed. In practice one does not experiment on animals in general, one experiments on one of a particular species. In each species the many physiological variables differ widely in their relevance to survival. Thus, if a man’s hair is shortened from 4 inches to 1 inch, the change is trivial; if his systolic blood pressure drops from 120 mm. of mercury to 30, the change will quickly be fatal.

Ashby noted that the organism affects the environment, and the environment affects the organism: such a system is said to have a feedback. Here the environment does not simply mean the space around the organism. Ashby had a specific definition for environment. Given an organism, its environment is defined as those variables whose changes affect the organism, and those variables which are then changed by the organism’s behavior. It is thus defined in a purely functional, not a material sense. The reactionary part is the sensory-motor framework of the organism. The feedback between the reactionary part (R) of an organism (Orgm) and the environment (Envt.) is depicted below:

Ashby explains this using an example of a kitten resting near a fire. The kitten settles at a safe distance from the fire. If a lump of hot coal falls near the kitten, the environment is threatening to have a direct affect on the essential variables. It the kitten’s brain does nothing; the kitten will get burned. The kitten being the ultrastable system is able to use the correct mechanism – move away from the hot coal and maintain its essential variables in check. Ashby proposed that an ultrastable system has two feedbacks. One feedback that operates frequently while the other feedback that operates infrequently when the essential variables are threatened. The two feedback loops are needed for a system to get back into equilibrium. This is also how the system can learn and adapt. Paul Pangaro and Michael C. Geoghegan note:

What are the minimum conditions of possibility that must exist such that a system can learn and adapt for the better, that is, to increase its chance of survival? Ashby concludes via rigorous argument that the system must have minimally two feedback loops, or double feedback… The first feedback loop, shown on the left side and indicated via up/down arrows, ‘plays its part within each reaction/behavior.’ As Ashby describes, this loop is about the sensory and motor channels between the system and the environment, such as a kitten that adjusts its distance from a fire to maintain warmth but not burn up. The second feedback loop encompasses both the left and right sides of the diagram, and is indicated via long black arrows. Feedback from the environment is shown coming into an icon for a meter in the form of a round dial, signifying that this feedback is measurable insofar as it impinges on the ‘essential variables.’

Ashby depicted his ultrastable system as below:

The first feedback loop can be thought as a mechanism that cannot change itself. It is static, while the second feedback loop is able to operate some parameters so that the structure can change resulting in a new behavior. The second feedback loop acts only when the essential variables are challenged or when the system is not in equilibrium. It must be noted that there are no decisions being made with the first feedback loop. It is simply an action mechanism. It keeps doing what was working before, while the second feedback loop alters the action mechanism to result in a new behavior. If the new behavior is successful in maintaining the essential variables, the new action is continued until it is not effective any longer. When the system is able to counter the threatening situation posed by the environment, it is said to have requisite variety. The law of requisite variety was proposed by Ashby as – only variety can absorb variety. The system must be able to have the requisite variety (in terms of available actions) to counter the variety thrown upon it by the environment. The environment always possesses far more variety than the system. The system must find ways to attenuate the variety coming in, and amplify its own variety to maintain the essential variables.

Let’s look at this with an easy example of a baby. When the baby experiences any sort of discomfort, it starts crying. The crying is the behavior that helps put it back into equilibrium (removal of discomfort) since it gets the attention from its mother or other family members. As the baby grows, its desired variables also get specific (food, water, love, etc.) The action of crying does not always get it what it is looking for. Here the second feedback loop comes in, and it tries a new behavior and see if it results in a better outcome. This behavior could be to point at something or even learning and using words. The new action is kept and used, as long as it becomes successful. The baby/child learns and adapts as needed to meet its own wants and desires.

Pangaro and Geoghegan note that the idea of an ultrastable system is applicable in social realms also. To evoke the social arena, we call the parameters ‘behavior fields.’ When learning by trial-and-error, a behavior field is selected at random by the system, actions are taken by the system that result in observable behaviors, and the consequences of these actions in the environment are in turn registered by the second feedback loop. If the system is approaching the danger zone, and the essential variables begin to go outside their acceptable limits, the step function says, ‘try something else’—repeatedly, if necessary—until the essential variables are stabilized and equilibrium is reached. This new equilibrium is the learned state, the adapted state, and the system locks-in.

It is important to note that the first feedback loop is the overt behavior that is locked in. The system cannot change this unless the second feedback loop is engaged. Stuart Umpleby cites Ashby’s example of an autopilot to explain this further:

In his theory of adaptation two feedback loops are required for a machine to be considered adaptive (Ashby 1960).  The first feedback loop operates frequently and makes small corrections.  The second feedback loop operates infrequently and changes the structure of the system, when the “essential variables” go outside the bounds required for survival.  As an example, Ashby proposed an autopilot.  The usual autopilot simply maintains the stability of an aircraft.  But what if a mechanic miswires the autopilot?  This could cause the plane to crash.  An “ultrastable” autopilot, on the other hand, would detect that essential variables had gone outside their limits and would begin to rewire itself until stability returned, or the plane crashed, depending on which occurred first. The first feedback loop enables an organism or organization to learn a pattern of behavior that is appropriate for a particular environment.  The second feedback loop enables the organism to perceive that the environment has changed and that learning a new pattern of behavior is required.

Ohno’s Production System:

Once I saw that the idea of an ultrastable system may be applied to the social realm, I wanted to see how it can be applied to Ohno’s Production System. Taiichi Ohno is regarded as the father of the famous Toyota Production System. Before it was “Toyota Production System”, it was Ohno’s Production System. Taiichi Ohno was inspired by the challenge issued by Kiichiro Toyoda, the founder of Toyota Motor Corporation. The challenge was to catch up with America in 3 years in order to survive.  Ohno built his ideas with inspirations from Sakichi Toyoda, Kiichiro Toyoda, Henry Ford and the supermarket system. Ohno did a lot of trial and error. And the ideas he implemented, he made sure were followed. Ohno was called “Mr. Mustache”. The operators thought of Ohno as an eccentric. They used to joke that military men used to wear mustaches during World War II, and that it was rare to see a Japanese man with facial hair afterward. “What’s Mustache up to now?” became a common refrain at the plant as Ohno carried out his studies. (Source: Against All Odds, Togo and Wartman)

His ideas were not easily understood by others. He had to tell others that he will take responsibility for the outcomes, in order to convince them to follow his ideas. Ohno could not completely make others understand his vision since his ideas were novel and not always the norm. Ohno was persistent, and he made improvements slowly and steadily. He would later talk about the idea of Toyota being slow and steady like the tortoise. Ohno loved what he did, and he had tremendous passion pushing him forward with his vision. As noted, his ideas were based on trial and error, and were thus perceived as counter-intuitive by others.

Ohno can be viewed as part of the second feedback loop and the assembly line as part of the first feedback loop, while the survivability of the company via the metrics of cost, quality, productivity etc. can be viewed as the “essential variables”. Ohno implemented the ideas of kanban, jidoka etc. on the line, and they were followed. The assembly line could not change the mechanisms established as part of Ohno’s production system. Ohno’s production system can be viewed as a closed system in that the framework is static. Ohno watched how the interactions with the environment went, and how the essential variables were being impacted. Based on this, the existing behaviors were either changed slightly, or changed out all the way until the desired equilibrium was achieved.

Here the production system framework is static because it cannot change itself. The assembly line where it is implemented is closed to changes at a given time. It is “action oriented” without decision powers to make changes to itself. There is no point in copying the framework unless you have the same problems that Ohno faced.

Umpleby also describes the idea of the double feedback loop in terms of quality improvement similar to what we have discussed:

The basic idea of quality improvement is that an organization can be thought of as a collection of processes. The people who work IN each process should also work ON the process, in order to improve it. That is, their day-to-day work involves working IN the process (the first, frequent feedback loop). And about once a week they meet as a quality improvement team to consider suggestions and to design experiments on how to improve the process itself. This is the second, less frequent feedback loop that leads to structural changes in the process. Hence, process improvement methods, which have been so influential in business, are an illustration of Ashby’s theory of adaptation.

This follows the idea of kairyo and kaizen in the Toyota Production System.

Final Words:

It is important to note that Ohno’s Production System is not Toyota Production System is not Toyota’s Production System is not Lean. Ohno’s Production System evolved into Toyota Production System. Toyota’s production system is emergent while Toyota Production System is not. Toyota Production System’s framework can be viewed as a closed system, in the sense that the framework is static. At the same time, the different plants implementing the framework are dynamic due to the simple fact that they exist in an everchanging environment. For an organization to adapt to an everchanging environment, it needs to be ultrastable. An organization can have several ultrastable systems connected with each other resulting in a homeostasis. I will finish with an excellent quote from Mike Jackson.

The organization should have the best possible model of the environment relevant to its purposes… the organization’s structure and information flows should reflect the nature of that environment so that the organization is responsive to it.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was The Cybernetics of a Society:

The Cybernetics of a Society:

In today’s post, I will be following the thoughts from my previous post, Consistency over Completeness. We were looking at each one of us being informationally closed, and computing a stable reality. The stability comes from the recursive computations of what is being observed. I hope to expand the idea of stability from an individual to a society in today’s post.

Humberto Maturana, the cybernetician biologist (or biologist cybernetician) said – anything said is said by an observer. Heinz von Foerster, one of my heroes in cybernetics, expanded this and said – everything said is said to an observer. Von Foerster’s thinking was that language is not monologic but always dialogic. He noted:

The observer as a strange singularity in the universe does not attract me… I am fascinated by images of duality, by binary metaphors like dance and dialogue where only a duality creates a unity. Therefore, the statement.. – “Anything said is said by an observer” – is floating freely, in a sense. It exists in a vacuum as long as it Is not embedded in a social structure because speaking is meaningless, and dialogue is impossible, if no one is listening. So, I have added a corollary to that theorem, which I named with all due modesty Heinz von Foerster’s Corollary Nr. 1: “Everything said is said to an observer.” Language is not monologic but always dialogic. Whenever I say or describe something, I am after all not doing it for myself but to make someone else know and understand what I am thinking of intending to do.

Heinz von Foerster’s great insight was perhaps inspired by the works of his distant relative and the brilliant philosopher, Ludwig Wittgenstein. Wittgenstein proposed that language is a very public matter, and that a private language is not possible. The meaning of a word, such as “apple” does not inherently come from the word “apple”. The meaning of the word comes from how it is used. The meaning comes from repeat usage of the word in a public setting. Thus, even though the experience of an apple may be private to the individual, how we can describe it is by using a public language. Von Foerster continues:

When other observers are involved… we get a triad consisting of the observers, the languages, and the relations constituting a social unit. The addition produces the nucleus and the core structure of society, which consists of two people using language. Due to the recursive nature of their interactions, stabilities arise, they generate observers and their worlds, who recursively create other stable worlds through interacting in language. Therefore, we can call a funny experience apple because other people also call it apple. Nobody knows, however, whether the green color of the apple you perceive, is the same experience as the one I am referring to with the word green. In other words, observers, languages, and societies are constituted through recursive linguistic interaction, although it is impossible to say which of these components came first and which were last – remember the comparable case of hen, egg and cock – we need all three in order to have all three.

Klaus Krippendorff defined closure as follows – A system is closed if it provides its own explanation and no references to an input are required. With closures, recursions are a good and perhaps the only way to interact. As organizationally closed entities, we are able to stay viable only as part of a social realm. When we are part of a social realm, we have to construct reality with reference to an external reference. Understanding is still generated internally, but with an external point of reference. This adds to the reality of the social realm as a collective. If the society has to have an identity that is sustained over time, its viability must come from its members. Like a set of nested dolls, society’s structure comes from participating individuals who themselves are embedded recursively in the societal realm. The structure of the societal or social realm is not designed, but emergent from the interactions, desires, goals etc. of the individuals. The society is able to live on while the individuals come and go.

I am part of someone else’s environment, and I add to the variety of their environment with my decisions and actions (sometimes inactions). This is an important reminder for us to hold onto in light of recent world events including a devastating pandemic. I will finish with some wise words from Heinz von Foerster:

A human being is a human being together with another human being; this is what a human being is. I exist through another “I”, I see myself through the eyes of the Other, and I shall not tolerate that this relationship is destroyed by the idea of the objective knowledge of an independent reality, which tears us apart and makes the Other as object which is distinct from me. This world of ideas has nothing to do with proof, it is a world one must experience, see, or simply be. When one suddenly experiences this sort of communality, one begins to dance together, one senses the next common step and one’s movements fuse with those of the other into one and the same person, into a being that can see with four eyes. Reality becomes communality and community. When the partners are in harmony, twoness flows like oneness, and the distinction between leading and being led has become meaningless.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was Consistency over Completeness:

Source – The Certainty of Uncertainty: Dialogues Introducing Constructivism By Bernhard Poerksen

Consistency over Completeness:

Today’s post is almost a follow-up to my earlier post – The Truth about True Models. In that post, I talked about Dr. Donald Hoffman’s idea of Fitness-Beats-Truth or FBT Theorem. Loosely put, the idea behind the FBT Theorem is that we have evolved to not have “true” perceptions of reality. We survived because we had “fitness” based models and because we did not have “true models”. In today’s post, I am continuing on this idea using the ideas from Heinz von Foerster, one of my Cybernetics heroes.

Heinz von Foerster came up with “the postulate of epistemic homeostasis”. This postulate states:

The nervous system as a whole is organized in such a way (organizes itself in such a way) that it computes a stable reality.

It is important to note here that, we are speaking about computing “a” reality and not “the” reality. Our nervous system is informationally closed (to follow up from the previous post). This means that we do not have direct access to the reality outside. All we have is what we can perceive through our perception framework. The famous philosopher, Immanuel Kant, referred to this as the noumena (the reality that we don’t have direct access to) and the phenomena (the perceived representation of the external reality). All we can do is to compute a reality based on our interpretive framework. This is just a version of the reality, and each one of us computes such a reality that is unique to each one of us.

The other concept to make note of is the “stable” part of the stable reality. In Godelian* speak, our nervous system cares more about consistency than completeness. When we encounter a phenomenon, our nervous system looks at stable correlations from the past and present, and computes a sensation that confirms the perceived representation of the phenomenon. Von Foerster gives the example of a table. We can see the table, and we can touch it, and maybe bang on it. With each of these confirmations and correlations between the different sensory inputs, the table becomes more and more a “table” to us.

*Kurt Godel, one of the famous logicians of last century came up with the idea that any formal system able to do elementary arithmetic cannot be both complete and consistent; it is either incomplete or inconsistent.

From the cybernetics standpoint, we are talking about an observer and the observed. The interaction between the observer and the observed is an act of computing a reality. The first step to computing a reality is making distinctions. If there are no distinctions, everything about the observed will be uniform, and no information can be processed by the observer. Thus, the first step is to make distinctions. The distinctions refer to the variety of the observed. The more distinctions there are, the more variety the observed has. From a second order cybernetics standpoint, the variety of the observed depends upon of the variety of the observer. This goes back to the unique stable reality computation point from earlier. Each one of us are unique in how we perceive things. This is our variety as the observer. The observed, that which is external to us, always has more potential variety than us. We cut down or attenuate this high variety by choosing certain attributes that interests us. Once the distinctions are made, we find relations between these distinctions to make sense of it all. This corresponds to the confirmations and correlations that we noted above in the example of a table.

We are able to survive in our environment because we are able to continuously compute a stable reality. The stability comes from the recursive computations of what is being observed. For example, lets go back to the example of the table. Our eyes receive the sensory input of the image of the table. This is a first set of computation. This sensory image then goes up the “neurochain”, where it is computed again. This happens again and again as the input gets “decoded” at each level, until it gets satisfactorily decoded by our nervous system. The final result is a computation of a computation of a computation of a computation and so on. The stability is achieved from this recursion.

The idea of a consistency over completeness is quite fascinating. This is mainly due to the limitation of our nervous system to have a true representation of the reality. There is a common belief that we live with uncertainty, but our nervous system strives to provide us a stable version of reality, one that is devoid of uncertainties. This is a fascinating idea. We are able to think about this only from a second order standpoint. We are able to ponder about our cognitive blind spots because we are able to do second order cybernetics. We are able to think about thinking. We are able to put ourselves into the observed. Second order cybernetics is the study of observing systems where the observer themselves are part of the observed system.

I will leave the reader with a final thoughtthe act of observing oneself is also a computation of “a” stable reality.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was Wittgenstein and Autopoiesis:

Wittgenstein and Autopoiesis:

In Tractatus Logico-Philosophicus, Wittgenstein wrote the following:

“The world of the happy man is a different one from that of the unhappy man.”

He also noted that, if a lion could talk, we would not understand him.

As a person very interested in cybernetics, I am looking at what Wittgenstein said in the light of autopoiesis. Autopoiesis is the brainchild of mainly two Chilean biologist cyberneticians Humberto Maturana and Francesco Varela. Autopoiesis was put forth as the joining of two Greek words, “auto” meaning self, and “poiesis” meaning creating. I have talked about autopoiesis here.  I am most interested in the autopoiesis’ idea of “organizational closure” for this post. An entity is organizationally closed when it is informationally tight. In other words, autopoietic entities maintain their identities by remaining informationally closed to their surroundings. We, human beings are autopoietic entities. We cannot take in information as a commodity. We generate meaning within ourselves based on experiencing external perturbations. Information does not enter from outside into our brain.

Let’s take the example of me looking at a blue light bulb. I interpret the presence of the blue light as being blue when my eyes are hit with the light. The light does not inform my brain, but rather my brain interprets the light as blue based on all my previous similar interactions I have had. There is no qualitative information coming to my brain saying that it is a blue light, but rather my brain interprets it as a blue light. It is “informative” rather than being a commodity piece of information. As cybernetician Bernard Scott noted:

…an organism does not receive “information” as something transmitted to it, rather, as a circularly organized system it interprets perturbations as being informative.

All of my previous interactions/perturbations with the light, and others explaining those interactions as being “blue light” generated a structural coupling so that my brain perceives a new similar perturbation as being “blue light”. This also brings up another interesting idea from Wittgenstein. We cannot have a private language. One person alone cannot invent a private language. All we have is public language, one that is reinterpreted and reinforced with repeat interactions. The sensation that we call “blue light” is a unique experience that is 100% unique to me as the interpreter. This supports the concept of autopoiesis as well. We cannot “open” ourselves to others so that they can see what is going on inside our head/mind.

Our interpretive framework, which we use to make sense of perturbations hitting us, is a result of all our past experiences and reinforcements. Our interpretive framework is unique to us homo sapiens. We share a similar interpretive framework, but the actual results from our interpretive framework is unique to each one of us. It is because of this that even if a lion could talk to us, we would not be able to understand it, at least not at the start. We lack the interpretive framework to understand it. The uniqueness of our interpretive framework is also the reason we feel differently regarding the same experiences. This is the reason, as a happy person, we cannot understand the world of a sad person, and vice versa.

Our brain makes sense based on the sensory perturbation and the interpretive framework it already has. A good example to think about this is the images that fall on our retina. The images are upside down, but we are able to “see” right side up. This is possible due to our structural coupling. What happens if there is a new sensory perturbation? We can only make sense of what we know. If we face a brand-new perturbation, we can make sense of it only in terms of what we know. The more we know, the more we are further able to know. As we face the same perturbation repeatedly, we are able to “better” experience it, and describe it to ourselves in a richer manner. With enough repeat interactions, we are finally able to experience it in our own unique manner. From this standpoint, there is no mind-body separation. The “mind” and “body” are both part of the same interpretive framework.

I will leave with another thought experiment to spark these ideas in the reader’s mind. There has always been talk about aliens. From what Wittgenstein taught us, when we meet the aliens, will we be able to understand each other?

I recommend the following posts to the reader expand upon this post:

If a Lion Could Talk:

The System in the Box:

A Study of “Organizational Closure” and Autopoiesis:

Please maintain social distance and wear masks. Stay safe and Always keep on learning… In case you missed it, my last post was When is a Model Not a Model?

When is a Model Not a Model?

Ross Ashby, one of the pioneers of Cybernetics, started an essay with the following question:

I would like to start not at: How can we make a model?, but at the even more primitive question: Why make a model at all?

He came up with the following answer:

I would like then to start from the basic fact that every model of a real system is in one sense second-rate. Nothing can exceed, or even equal, the truth and accuracy of the real system itself. Every model is inferior, a distortion, a lie. Why then do we bother with models? Ultimately, I propose. we make models for their convenience.

To go further on this idea, we make models to come up with a way to describe “how things work?” This is done for us to also answer the question – what happens when… If there is no predictive or explanatory power, there is no use for the model. From a cybernetics standpoint, we are not interested in the “What is this thing?”, but the “What does this thing do?” We never try to completely understand a “system”. We understand it in chunks, the chunks that we are interested in. We construct a model in our heads that we call a “system” to make sense of how we think things work out in the world. We only care about certain specific interactions and its outcomes.

One of the main ideas that Ashby proposed was the idea of variety. Loosely put, variety is the number of available states a system has. For example, a switch has a variety of two – ON or OFF. A stop light has a variety of three (generally) – Red, Yellow or Green. As we increase the complexity, the variety also increases. The variety is dependent on the ability of the observer to discern them. A keen-eyed observer can discern a higher number of states for a phenomenon than another observer. Take the example of the great fictional characters, Sherlock Holmes and John Watson. Holmes is able to discern more variety than Watson, when they come upon a stranger. Holmes is able to tell the most amazing details about the stranger that Watson cannot. When we construct a model, the model lacks the original variety of the phenomenon we are modeling. This is important to keep in mind. The external variety is always much larger than the internal variety of the observer. The observer simply lacks the ability to tackle the extremely high amount of variety. To address this, the observer removes or attenuates the unwanted variety of the phenomenon and constructs a simpler model. For example, when we talk about a healthcare system, the model in our mind is pretty simple. One hospital, some doctors and patients etc. It does not include the millions of patients, the computer system, the cafeteria, the janitorial service etc. We only look at the variables that we are interested in.

Ashby explained this very well:

Another common aim that will have to be given up is that of attempting to “understand” the complex system; for if “understanding” a system means having available a model that is isomorphic with it, perhaps in one’s head, then when the complexity of the system exceeds the finite capacity of the scientist, the scientist can no longer understand the system—not in the sense in which he understands, say, the plumbing of his house, or some of the simple models that used to be described in elementary economics.

A crude depiction of model-making is shown below. The observer has chosen certain variables that are of interest, and created a similar “looking” version as the model.

Ashby elaborated on this idea as:

We transfer from system to model to lose information. When the quantity of information is small, we usually try to conserve it; but when faced with the excessively large quantities so readily offered by complex systems, we have to learn how to be skillful in shedding it. Here, of course, model-makes are only following in the footsteps of the statisticians, who developed their techniques precisely to make comprehensible the vast quantities of information that might be provided by, say, a national census. “The object of statistical methods, said R. A. Fisher, “is the reduction of data.”

There is an important saying from Alfred Korzybski – the map is not the territory. His point was that we should take the map to be the real thing. An important corollary to this, as a model-maker is:

If the model is the same as the phenomenon it models, it fails to serve its purpose. 

The usefulness of the model is in it being an abstraction. This is mainly due to the observer not being able to handle the excess variety thrown at them. This also answers one part of the question posed in the title of this post – A model ceases to be a model when it is the same as the phenomenon it models. The second part of the answer is that the model has to have some similarities to the phenomenon, and this is entirely dependent on the observer and what they want.

This brings me to the next important point – We can only manage models. We don’t manage the actual phenomenon; we only manage the models of the phenomenon in our heads. The reason being again that we lack the ability to manage the variety thrown at us.

The eminent management cybernetician, Stafford Beer, has the following words of wisdom for us:

Instead of trying to specify it in full detail, you specify it only somewhat. You then ride on the dynamics of the system in the direction you want to go.

To paraphrase Ashby, we need not collect more information than is necessary for the job. We do not need to attempt to trace the whole chain of causes and effects in all its richness, but attempt only to relate controllable causes with ultimate effects.

The final aspect of model-making is to take into consideration the temporary nature of the model. Again, paraphrasing Ashby – We should not assume the system to be absolutely unchanging. We should accept frankly that our models are valid merely until such time as they become obsolete.

Final Words:

We need a model of the phenomenon to manage the phenomenon. And how we model the phenomenon depends upon our ability as the observer to manage variety. We only need to choose certain specific variables that we want. Perhaps, I can explain this further with the deep philosophical question – If a tree falls in a forest and no one is around to hear it, does it make a sound? The answer to a cybernetician should be obvious at this point. Whether there is sound or not depends on the model you have, and if you have any value in the tree falling having a sound.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was The Maximum Entropy Principle:

The Maximum Entropy Principle:

In today’s post, I am looking at the Maximum Entropy principle, a brainchild of the eminent physicist E. T. Jaynes. This idea is based on Claude Shannon’s Information Theory. The Maximum Entropy principle (an extension of the Principle of Insufficient Reason) is the ideal epistemic stance. Loosely put, we should model only what is known, and we should assign maximum uncertainty for what is unknown. To explain this further, let’s look at an example of a coin toss.

If we don’t know anything about the coin, our prior assumption should be that heads or tails are equally likely to happen. This is a stance of maximum entropy. If we assumed that the coin was loaded, we would be trying to “load” our assumption model, and claim unfair certainty. Entropy is a measure proposed by Claude Shannon as part of his information theory. Low entropy messages have low information content or low surprise content. High entropy messages on the other hand have high information content or high surprise content. The informational entropy is also inversely proportional to the probability of an event. Low probability events have high information content. For example, an unlikely defeat of a reigning sports team generates more surprise than a likely win. Entropy is the average level of information when we consider all of the probabilities. In the case of the coin toss, the entropy is the average level of information when we consider the probability of heads or tail. For discrete events, the entropy is maximum for equally likely events, or in other words for uniform distribution. Thus, when we say that the probability of heads or tails is 0.5, we are assuming a maximum entropy model. In the case of uniform distribution, the maximum entropy model is also the same as Laplace’s principle of insufficient reason. If the coin was always landing on heads, we have a zero entropy case because there is no new information available. If it is a loaded coin that makes one side more likely to occur, then the entropy is lower than if it is a fair coin. This is shown below, where the X-axis is the probability of Heads, and the Y-axis is the information entropy. We can see that Pr(0) or no Heads, and Pr(1) or 100% Heads have zero entropy value. The highest value for entropy happens when the probability for heads is 0.5 or 50%. For those who are interested, Jon von Neumann had a great idea to make a loaded coin fair. You can check out that here.

From this standpoint, if we take a game, where one team is more favored to win, we could say that the most informative part of a game is sometimes the coin toss.

Let’s consider the case of a die. There are six possible events (1 through 6) when we roll a die. The maximum entropy model will be to assume a uniform distribution, i.e., to assign 1/6 as the probability for 1 through 6 value. If we somehow knew that 6 is more likely to happen. For example, if the manufacturer of the loaded die says that the number 6 is likely to occur 3/6 of the times. Per the maximum entropy model, we should divide the remaining 3/6 equally among the remaining 5 numbers. With each additional piece of information, we should change our model so that the entropy is at its maximum. What I have discussed here is the basic information regarding maximum entropy. Each new piece of “valid” information that we need to incorporate into our model is called a constraint. The maximum entropy approach utilizes Lagrangian multipliers to find the solutions. For discrete events, with no additional information, the maximum entropy model is the uniform distribution. In a similar vein, if you are looking at a continuous distribution, and you knew what the mean and variance of the distribution is, the maximum entropy model is the normal distribution.

The Role of The Observer:

Jaynes asked a great question about the information content of a message. He noted:

In a communication process, the message m(i) is assigned probability p(i), and the entropy H, is a measure of information. But WHOSE information?… The probabilities assigned to individual messages are not measurable frequencies; they are only a means of describing a state of knowledge.

The general idea of probability in the frequentist’s version of statistics is that it is fixed. However, in the Bayesian version, the probability is not a fixed entity. It represents a state of knowledge. Jaynes continues:

Entropy, H, measures not the information of the sender, but the ignorance of the receiver that is removed by the receipt of the message.

To me, this brings up the importance of the observer and circularity. As the great cybernetician Heinz von Foerster said:

“The essential contribution of cybernetics to epistemology is the ability to change an open system into a closed system, especially as regards the closing of a linear, open, infinite causal nexus into closed, finite, circular causality.”

Let’s go back to the example of a coin. If I am an alien and if I knew nothing about coins, should my maximum entropy model only include two possibilities of heads or tails? Why should it not include the coin landing on its edge? Or if a magician is tossing the coin, should I account for the coin to vanish in thin air? The assumption of just two possibilities (head or tails) is the prior information that we are accounting for, by saying that the probability of a heads or a tail is 0.5. As we gain more knowledge about the coin toss, we can update the model to reflect it, and at the same time change the model to a new state of maximum entropy. This iterative, closed loop process is the backbone of scientific enquiry and skepticism. The use of the maximum entropy model is a stance that we are taking to state our knowledge. Perhaps a better way to explain the coin toss is that – given our lack of knowledge about the coin, we are saying that the heads is not more likely to happen than tails until we find more evidence. Let’s look at another interesting example where I think the maximum entropy model comes up.

The Veil of Ignorance:

The veil of ignorance is an idea about ethics proposed by the great American Political philosopher, John Rawls. Loosely put, in this thought experiment, Rawls is asking us what kind of society should we aim for? Rawls asks us to imagine that we are behind a veil of ignorance, where we are completely ignorant of our natural abilities, societal standing, family etc. We are then randomly assigned a role in society. The big question then is – what should society be like where this random assignment promotes fairness and equality? The random assignment is a maximum entropy model since any societal role is equally likely.

Final Words:

Maximum entropy principle is a way of saying to not put all of your eggs in one basket. It is a way to be aware of your biases and it is an ideal position for learning. It is similar to the Epicurus’ principle of Multiple Explanations, that says – “Keep all the different hypotheses that are consistent with the facts.”

It is important to understand that “I don’t know,” is a valid and acceptable answer. It marks the boundary for learning.

Jaynes explained maximum entropy as follows:

The maximum entropy distribution may be asserted for the positive reason that is uniquely determined as the one which is maximally noncommittal with regard to missing information, instead of the negative one that there was no reason to think otherwise… Mathematically, the maximum entropy distribution has the important property that no possibility is ignored; it assigns positive weight to every situation that Is not absolutely excluded by the given information.

We learned that probability and entropy are dependent on the observer. I will finish off with the wise words from James Dyke and Axel Kleidon.

Probability can now be seen as assigning a value to our ignorance about a particular system or hypothesis. Rather than the entropy of a system being a particular property of a system, it is instead a measure of how much we know about a system.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was Destruction of Information/The Performance Paradox:

Destruction of Information/The Performance Paradox:

Ross Ashby was one of the pioneers of Cybernetics. His 1956 book, An Introduction to Cybernetics, is still one of the best introductions to Cybernetics. As I was researching his journals, I came across an interesting phrase – “destruction of information.” Ashby noted:

I am not sure whether I have stated before my thesis – that the business of living things is the destruction of information.

Ashby gave several examples to explain what he meant by this. For example:

Consider a thermostat controlling a room’s temperature. If it is working well, we can get no idea, from the temperature of the room whether it is hot or cold outside. The thermostat’s job is to stop this information from reaching the occupant.

He also gave the example of an antiaircraft gun and its predictor. Suppose we observe only the error made by each shell in succession. If the predictor is perfect, we shall get the sequence of 0,0,0,0 etc. By examining this sequence, we can get no information of about how the aircraft maneuvered. Contrast this with the record of a poor predictor: 2, 1, 2, 3… -3, 0, 3 etc. By examining, this we can get quite a good idea of how the pilot maneuvered. In general, the better the predictor, the less the maneuvers show in the errors. The predictor’s job is to destroy this information.

As an observer, we learn about a living system or a phenomenon by the variety it displays. Here, variety can be loosely expressed as the number of distinct states a system has. Interestingly, the number of states or the variety is dependent upon the system demonstrating it, as well as the observer’s ability to distinguish the different states. If the observer is not able to make the needed number of distinctions, then less information is generated. On the other hand, if the system of interest is able to hide its different states, it minimizes the amount of information available for the observer. In this post, we are interested in the latter category. Ashby talks about an interesting example to further this idea:

An insect whose coloration makes it invisible will not show, by its survival or disappearance whether a predator has or has not seen it. An imperfectly colored one will reveal this fact by whether it has survived or not.

Another example, Ashby gives is that of an expert boxer:

An expert boxer, when he comes home, will show no signs of whether he had a fight in the street or not. An imperfect boxer will carry the information.

Ashby’s idea can be further looked at from an adaptation standpoint. When you adapt very well to your everchanging surroundings, you are destroying information or you are not demonstrating any information. Ashby also noted that adaptation means “destroying information.” In this manner, you know that you are adapting well, when you don’t break a sweat. A master swordsman moves effortlessly while defeating an opponent. A good runner is not out of breath after a quick sprint.

The Performance Paradox:

My take on this idea from Ashby is to express it as a form of performance paradox – When something works really well, you will not notice it, or worse you will think that it’s wasteful. The most effective and highly efficient components stay the quietest. The best spy is the one you have not ever heard of. When you try to monitor a highly performing component, you may rarely get evidence of its performance. It is almost as if it is wasteful. Another way to view this is – the imperfect components lend themselves to be monitored, while the perfect components do not. The danger in not understanding regulation from a cybernetics standpoint is to completely misread the interactions, and assume that the perfect component has no value.

I encourage the reader to read further upon these ideas here:

Edit (12/1/2020): Adding more clarity on “destruction of information”.

The phrase “destruction of information” was used by Ashby from a Shannon entropy sense. He is indicating that the agent is purposefully reducing the information entropy that would had been otherwise available. Another example is that of a good poker player, who is difficult to read.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was Locard’s Exchange Principle at the Gemba:

Locard’s Exchange Principle at the Gemba:

In today’s post, I am looking at Locard’s Exchange Principle, named after the famous French Criminologist, Edmond Locard. Succinctly put, the exchange principle can be stated as “every contact leaves a trace.” This is perhaps well explained by Paul L. Kirk in his 1953 book, Crime Investigation: Physical Evidence and the Police Laboratory:

Wherever he steps, whatever he touches, whatever he leaves, even unconsciously, will serve as a silent witness against him. Not only his fingerprints or his footprints, but his hair, the fibers from his clothes, the glass he breaks, the tool mark he leaves, the paint he scratches, the blood or semen he deposits or collects. All of these and more bear mute witness against him. This is evidence that does not forget. It is not confused by the excitement of the moment. It is not absent because human witnesses are. It is factual evidence. Physical evidence cannot be wrong, it cannot perjure itself, it cannot be wholly absent. Only human failure to find it, study and understand it can diminish its value.

In other words, the perpetrator involved in a crime brings something into the scene and at the same time takes something with them. They both can be used against the perpetrator as forensic evidence. As a huge fan of mystery stories and shows, I was very interested when I first heard about this principle. Rather than the applications in the forensics science, I was thinking about it from a cybernetics standpoint. When two people converse with each other, their interactions can be viewed in the light of Locard’s exchange principle. Both of them bring something into the conversation, and in turn take something with them. There is a cross-transfer of ideas with successful conversations. To quote the late German philosopher, Hans-Georg Gadamer:

The true reality of human communication is such that a conversation doesn’t simply enforce one opinion over and against the other, nor does it simply add one opinion to another, as a kind of addition. Rather, true conversation transforms both viewpoints.

It may be challenged that true conversations do not always take place. However, this is something that we can strive for. At the same time, we need to be mindful not to treat information as a commodity that can be passed around. Just because we convey a message by speaking it out aloud, it does not mean that the message is conveyed. As the great cybernetician, Heinz von Foerster, would say – the hearer not the utterer determines the meaning of a message.

Claude Shannon, the father of Information Theory, looked in depth on successful transmission of messages. He noted:

The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is, they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that they are selected from a set of possible messages.

Shannon’s model had a source (a sender of a message), a transmission medium (a channel with noise and distortion) and a receiver. The sender had to encode the message and sent it through the medium. The receiver had to receive the message and decode the message and reconstruct the message. The receiver had to have a set of possible messages so that they were able to properly decode the message such that any distortion or noise introduced in the medium can be compensated for. Shannon came up with a quantitative measure for the amount of information in a message – entropy. This is also a measure of surprise. For a message with low entropy, there is little surprise. For a message with high entropy, there is a lot of surprise, and this requires redundancy to ensure that the message is properly conveyed. For example, if the sender is sending a message, “011”, then the sender can repeat the message three times. “011 011 011”. Thus, if the message gets distorted such as “011 001 011”, the receiver is able to still decode the message as “011”. Curiously, if the message has a full amount of surprise, then the receiver will not be able to decode the message. Thus, if the message was entirely new information, the message will not be decoded successfully, no matter how much redundancy is entered. This is the whole point of cryptic messages.

We are autopoietic entities, which means that we are informationally closed. No information can come into our organization from the outside. We are closed to information coming in. Any information is generated from within when we are exposed to perturbations from the outside. I have previously talked about this before. See here and here. We generate the information based on the perceptual network evolved specifically for us. We cannot pass information around as a commodity. Autopoeisis is the brainchild of Humberto Maturana and Francesco Varela. They noted:

Autopoietic systems do not have inputs or outputs. They can be perturbated by independent events and undergo internal structural changes which compensate these perturbations.

When we are communicating as part of being at the gemba, we have to keep in mind that we may not completely understand the meaning as the way the utterer intended. In a similar way, the hearer, the other person, may not have understood the meaning as we had intended the meaning to be. Even though we both may have heard each other 100%, we may not have communicated 100% (the way we think at least). Instead, I am interpreting what the other person is saying, and trying to respond to what I think the other person has said. The same applies to the other person. We are both interpreting each other. We are both trying to perturb each other with the hope that the meaning that is being generated has some similarity to what we want to communicate. It is here that I appreciate Locard’s Exchange Principle. We are coming in and leaving something (not the entire thing) at the scene, and at the same time, we are taking something (again not the entire thing) with us as we leave the scene. When we communicate, we are hopefully inspiring each other. Communication is never achieved 100%, but some transfer of ideas takes place resulting in transformation of existing ideas. As Gadmer indicated, when we communicate, the ideas do not get added on top of each other in an additive fashion. Rather, the ideas get transformed. When we are at the gemba, we should be keen on listening with intent. We should be open to receiving the ideas from others and be willing to transform. We should be mindful that what we are saying will not be understood the way we want it to be. We should also be mindful of our non-verbal communication. Most of the time, we can tell a lot by how a leader acts. A leader often talks the talk that we want to hear. However, their actions often talk the loudest.

I will stop with the great George Bernard Shaw’s wonderful quote on communication:

The biggest problem with communication is the illusion that it has occurred.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was The Truth About True Models:

The Truth About True Models:

I recently came across Dr. Donald Hoffman’s idea of Fitness-Beats-Truth or FBT Theorem. This is the idea that evolution stamps out true perceptions. In other words, an organism is more likely to survive if it does not have a true and accurate perception. As Hoffman explains it:

Suppose there is an objective reality of some kind. Then the FBT Theorem says that natural selection does not shape us to perceive the structure of that reality. It shapes us to perceive fitness points, and how to get them… The FBT Theorem has been tested and confirmed in many simulations. They reveal that Truth often goes extinct even if Fitness is far less complex.

Hoffman suggests that natural selection did not shape us to perceive the structure of an objective reality. Evolution gave us a less complex but efficient perceptual network that takes shortcuts to perceive “fitness points.” Evolution by natural selection does not favor true perceptions—it routinely drives them to extinction. Instead, natural selection favors perceptions that hide the truth and guide useful action.

An easy to way to digest this idea is to consider our ancient ancestors. If they heard a rustling sound in the grass, it benefitted them to not analyze and capture the entire surrounding to get an accurate and true model of the reality. Instead, they would survive only if they got a “quick and dirty” or good-enough model of the surrounding. They did not gain anything by having an elaborate and accurate perception. Their quick and dirty heuristics such as “if you hear a rustling on the grass, then flee” allowed them to survive and pass of their genes. In other words, their fitter perception did not comprise of a true and accurate perception of the world around them. They gained (they survived) based on fitness rather than truth. As Hoffman noted, having true perception would have been detrimental because it avoided shortcuts and heuristics that saved time. As complexity increases, heuristics work much better.

The idea of FBT aligns pretty well with the ideas of second order cybernetics (SOC) and radical constructivism. From an SOC standpoint, the emphasis for the representation of the world is not that of a model of causality, but of a model of constraints. As Ernst von Glasersfeld explains this:

In the biological theory of evolution, we speak of variability and selection, of environmental constraints and of survival. If an organism survives individually or as a species it means that, so far at least, it has been viable in the environment in which it happens to live. To survive, however, does not mean that the organism must in any sense reflect the character or the qualities of his environment. Gregory Bateson (1967) was the first who noticed that this theory of evolution, Darwin’s theory, is really a cybernetic theory because it is based on the concept of constraint rather than on the concept of causation.

In order to remain among the survivors, an organism has to ‘‘get by” the constraints which the environment poses. It has to squeeze between the bars of the constraints, to coin a metaphor. The environment does not determine how that might he achieved. It does not cause certain organisms to have certain characteristics or capabilities or to be a certain way. The environment merely eliminates those organisms that knock against its constraints. Anyone who by any means manages to get by the constraints, survives… All the environment contributes is constraints that knock out some of the changed organisms while others are left to survive. Thus, we can say that the only indication we may get of the ‘‘real” structure of the environment is through the organisms and the species that have been extinguished; the viable ones that survive merely constitute a selection of solutions among an infinity of potential solutions that might be equally viable.

Nature prefers efficient solutions that does the work most of the time, rather than effective solutions that work all of the time – solutions that prefer least energy expenditure, least number of parts etc. This approach also resonates with Occam’s razor. It is always advisable to have the least number of assumptions in your model. Another way to look at this is – the design with the least number of moving parts is always preferred.

The idea that true perceptions are not always advantageous may be counterintuitive. As complexity increases, we lack the perceptual network to truly comprehend the complexity. How we perceive our world around us depends a lot on our perceptual network, which is unique to our species. Our reality consists of omitting most of the attributes of the world around us. As Hoffman explains – the reality becomes simply a species-specific representation of fitness points on offer, and how we can act to get those points. Evolution has shaped us with perceptions that allow us to survive. But part of that involves hiding from us the stuff we don’t need to know.

Complexity also favors this approach of viable solutions/fitter perceptions. Hoffman notes:

We find that increasing the complexity of objective reality, or perceptual systems, or the temporal dynamics of fitness functions, increases the selection pressures against veridical perceptions.

I will add more thoughts on the FBT theorem at a later time. I encourage the readers to check out Hoffman’s book, The Case Against Reality.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was Talking about Constraints in Cybernetics: