The Maximum Entropy Principle:

In today’s post, I am looking at the Maximum Entropy principle, a brainchild of the eminent physicist E. T. Jaynes. This idea is based on Claude Shannon’s Information Theory. The Maximum Entropy principle (an extension of the Principle of Insufficient Reason) is the ideal epistemic stance. Loosely put, we should model only what is known, and we should assign maximum uncertainty for what is unknown. To explain this further, let’s look at an example of a coin toss.

If we don’t know anything about the coin, our prior assumption should be that heads or tails are equally likely to happen. This is a stance of maximum entropy. If we assumed that the coin was loaded, we would be trying to “load” our assumption model, and claim unfair certainty. Entropy is a measure proposed by Claude Shannon as part of his information theory. Low entropy messages have low information content or low surprise content. High entropy messages on the other hand have high information content or high surprise content. The informational entropy is also inversely proportional to the probability of an event. Low probability events have high information content. For example, an unlikely defeat of a reigning sports team generates more surprise than a likely win. Entropy is the average level of information when we consider all of the probabilities. In the case of the coin toss, the entropy is the average level of information when we consider the probability of heads or tail. For discrete events, the entropy is maximum for equally likely events, or in other words for uniform distribution. Thus, when we say that the probability of heads or tails is 0.5, we are assuming a maximum entropy model. In the case of uniform distribution, the maximum entropy model is also the same as Laplace’s principle of insufficient reason. If the coin was always landing on heads, we have a zero entropy case because there is no new information available. If it is a loaded coin that makes one side more likely to occur, then the entropy is lower than if it is a fair coin. This is shown below, where the X-axis is the probability of Heads, and the Y-axis is the information entropy. We can see that Pr(0) or no Heads, and Pr(1) or 100% Heads have zero entropy value. The highest value for entropy happens when the probability for heads is 0.5 or 50%. For those who are interested, Jon von Neumann had a great idea to make a loaded coin fair. You can check out that here.

From this standpoint, if we take a game, where one team is more favored to win, we could say that the most informative part of a game is sometimes the coin toss.

Let’s consider the case of a die. There are six possible events (1 through 6) when we roll a die. The maximum entropy model will be to assume a uniform distribution, i.e., to assign 1/6 as the probability for 1 through 6 value. If we somehow knew that 6 is more likely to happen. For example, if the manufacturer of the loaded die says that the number 6 is likely to occur 3/6 of the times. Per the maximum entropy model, we should divide the remaining 3/6 equally among the remaining 5 numbers. With each additional piece of information, we should change our model so that the entropy is at its maximum. What I have discussed here is the basic information regarding maximum entropy. Each new piece of “valid” information that we need to incorporate into our model is called a constraint. The maximum entropy approach utilizes Lagrangian multipliers to find the solutions. For discrete events, with no additional information, the maximum entropy model is the uniform distribution. In a similar vein, if you are looking at a continuous distribution, and you knew what the mean and variance of the distribution is, the maximum entropy model is the normal distribution.

The Role of The Observer:

Jaynes asked a great question about the information content of a message. He noted:

In a communication process, the message m(i) is assigned probability p(i), and the entropy H, is a measure of information. But WHOSE information?… The probabilities assigned to individual messages are not measurable frequencies; they are only a means of describing a state of knowledge.

The general idea of probability in the frequentist’s version of statistics is that it is fixed. However, in the Bayesian version, the probability is not a fixed entity. It represents a state of knowledge. Jaynes continues:

Entropy, H, measures not the information of the sender, but the ignorance of the receiver that is removed by the receipt of the message.

To me, this brings up the importance of the observer and circularity. As the great cybernetician Heinz von Foerster said:

“The essential contribution of cybernetics to epistemology is the ability to change an open system into a closed system, especially as regards the closing of a linear, open, infinite causal nexus into closed, finite, circular causality.”

Let’s go back to the example of a coin. If I am an alien and if I knew nothing about coins, should my maximum entropy model only include two possibilities of heads or tails? Why should it not include the coin landing on its edge? Or if a magician is tossing the coin, should I account for the coin to vanish in thin air? The assumption of just two possibilities (head or tails) is the prior information that we are accounting for, by saying that the probability of a heads or a tail is 0.5. As we gain more knowledge about the coin toss, we can update the model to reflect it, and at the same time change the model to a new state of maximum entropy. This iterative, closed loop process is the backbone of scientific enquiry and skepticism. The use of the maximum entropy model is a stance that we are taking to state our knowledge. Perhaps a better way to explain the coin toss is that – given our lack of knowledge about the coin, we are saying that the heads is not more likely to happen than tails until we find more evidence. Let’s look at another interesting example where I think the maximum entropy model comes up.

The Veil of Ignorance:

The veil of ignorance is an idea about ethics proposed by the great American Political philosopher, John Rawls. Loosely put, in this thought experiment, Rawls is asking us what kind of society should we aim for? Rawls asks us to imagine that we are behind a veil of ignorance, where we are completely ignorant of our natural abilities, societal standing, family etc. We are then randomly assigned a role in society. The big question then is – what should society be like where this random assignment promotes fairness and equality? The random assignment is a maximum entropy model since any societal role is equally likely.

Final Words:

Maximum entropy principle is a way of saying to not put all of your eggs in one basket. It is a way to be aware of your biases and it is an ideal position for learning. It is similar to the Epicurus’ principle of Multiple Explanations, that says – “Keep all the different hypotheses that are consistent with the facts.”

It is important to understand that “I don’t know,” is a valid and acceptable answer. It marks the boundary for learning.

Jaynes explained maximum entropy as follows:

The maximum entropy distribution may be asserted for the positive reason that is uniquely determined as the one which is maximally noncommittal with regard to missing information, instead of the negative one that there was no reason to think otherwise… Mathematically, the maximum entropy distribution has the important property that no possibility is ignored; it assigns positive weight to every situation that Is not absolutely excluded by the given information.

We learned that probability and entropy are dependent on the observer. I will finish off with the wise words from James Dyke and Axel Kleidon.

Probability can now be seen as assigning a value to our ignorance about a particular system or hypothesis. Rather than the entropy of a system being a particular property of a system, it is instead a measure of how much we know about a system.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was Destruction of Information/The Performance Paradox:

Destruction of Information/The Performance Paradox:

Ross Ashby was one of the pioneers of Cybernetics. His 1956 book, An Introduction to Cybernetics, is still one of the best introductions to Cybernetics. As I was researching his journals, I came across an interesting phrase – “destruction of information.” Ashby noted:

I am not sure whether I have stated before my thesis – that the business of living things is the destruction of information.

Ashby gave several examples to explain what he meant by this. For example:

Consider a thermostat controlling a room’s temperature. If it is working well, we can get no idea, from the temperature of the room whether it is hot or cold outside. The thermostat’s job is to stop this information from reaching the occupant.

He also gave the example of an antiaircraft gun and its predictor. Suppose we observe only the error made by each shell in succession. If the predictor is perfect, we shall get the sequence of 0,0,0,0 etc. By examining this sequence, we can get no information of about how the aircraft maneuvered. Contrast this with the record of a poor predictor: 2, 1, 2, 3… -3, 0, 3 etc. By examining, this we can get quite a good idea of how the pilot maneuvered. In general, the better the predictor, the less the maneuvers show in the errors. The predictor’s job is to destroy this information.

As an observer, we learn about a living system or a phenomenon by the variety it displays. Here, variety can be loosely expressed as the number of distinct states a system has. Interestingly, the number of states or the variety is dependent upon the system demonstrating it, as well as the observer’s ability to distinguish the different states. If the observer is not able to make the needed number of distinctions, then less information is generated. On the other hand, if the system of interest is able to hide its different states, it minimizes the amount of information available for the observer. In this post, we are interested in the latter category. Ashby talks about an interesting example to further this idea:

An insect whose coloration makes it invisible will not show, by its survival or disappearance whether a predator has or has not seen it. An imperfectly colored one will reveal this fact by whether it has survived or not.

Another example, Ashby gives is that of an expert boxer:

An expert boxer, when he comes home, will show no signs of whether he had a fight in the street or not. An imperfect boxer will carry the information.

Ashby’s idea can be further looked at from an adaptation standpoint. When you adapt very well to your everchanging surroundings, you are destroying information or you are not demonstrating any information. Ashby also noted that adaptation means “destroying information.” In this manner, you know that you are adapting well, when you don’t break a sweat. A master swordsman moves effortlessly while defeating an opponent. A good runner is not out of breath after a quick sprint.

The Performance Paradox:

My take on this idea from Ashby is to express it as a form of performance paradox – When something works really well, you will not notice it, or worse you will think that it’s wasteful. The most effective and highly efficient components stay the quietest. The best spy is the one you have not ever heard of. When you try to monitor a highly performing component, you may rarely get evidence of its performance. It is almost as if it is wasteful. Another way to view this is – the imperfect components lend themselves to be monitored, while the perfect components do not. The danger in not understanding regulation from a cybernetics standpoint is to completely misread the interactions, and assume that the perfect component has no value.

I encourage the reader to read further upon these ideas here:

Edit (12/1/2020): Adding more clarity on “destruction of information”.

The phrase “destruction of information” was used by Ashby from a Shannon entropy sense. He is indicating that the agent is purposefully reducing the information entropy that would had been otherwise available. Another example is that of a good poker player, who is difficult to read.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was Locard’s Exchange Principle at the Gemba:

The Truth About True Models:

I recently came across Dr. Donald Hoffman’s idea of Fitness-Beats-Truth or FBT Theorem. This is the idea that evolution stamps out true perceptions. In other words, an organism is more likely to survive if it does not have a true and accurate perception. As Hoffman explains it:

Suppose there is an objective reality of some kind. Then the FBT Theorem says that natural selection does not shape us to perceive the structure of that reality. It shapes us to perceive fitness points, and how to get them… The FBT Theorem has been tested and confirmed in many simulations. They reveal that Truth often goes extinct even if Fitness is far less complex.

Hoffman suggests that natural selection did not shape us to perceive the structure of an objective reality. Evolution gave us a less complex but efficient perceptual network that takes shortcuts to perceive “fitness points.” Evolution by natural selection does not favor true perceptions—it routinely drives them to extinction. Instead, natural selection favors perceptions that hide the truth and guide useful action.

An easy to way to digest this idea is to consider our ancient ancestors. If they heard a rustling sound in the grass, it benefitted them to not analyze and capture the entire surrounding to get an accurate and true model of the reality. Instead, they would survive only if they got a “quick and dirty” or good-enough model of the surrounding. They did not gain anything by having an elaborate and accurate perception. Their quick and dirty heuristics such as “if you hear a rustling on the grass, then flee” allowed them to survive and pass of their genes. In other words, their fitter perception did not comprise of a true and accurate perception of the world around them. They gained (they survived) based on fitness rather than truth. As Hoffman noted, having true perception would have been detrimental because it avoided shortcuts and heuristics that saved time. As complexity increases, heuristics work much better.

The idea of FBT aligns pretty well with the ideas of second order cybernetics (SOC) and radical constructivism. From an SOC standpoint, the emphasis for the representation of the world is not that of a model of causality, but of a model of constraints. As Ernst von Glasersfeld explains this:

In the biological theory of evolution, we speak of variability and selection, of environmental constraints and of survival. If an organism survives individually or as a species it means that, so far at least, it has been viable in the environment in which it happens to live. To survive, however, does not mean that the organism must in any sense reflect the character or the qualities of his environment. Gregory Bateson (1967) was the first who noticed that this theory of evolution, Darwin’s theory, is really a cybernetic theory because it is based on the concept of constraint rather than on the concept of causation.

In order to remain among the survivors, an organism has to ‘‘get by” the constraints which the environment poses. It has to squeeze between the bars of the constraints, to coin a metaphor. The environment does not determine how that might he achieved. It does not cause certain organisms to have certain characteristics or capabilities or to be a certain way. The environment merely eliminates those organisms that knock against its constraints. Anyone who by any means manages to get by the constraints, survives… All the environment contributes is constraints that knock out some of the changed organisms while others are left to survive. Thus, we can say that the only indication we may get of the ‘‘real” structure of the environment is through the organisms and the species that have been extinguished; the viable ones that survive merely constitute a selection of solutions among an infinity of potential solutions that might be equally viable.

Nature prefers efficient solutions that does the work most of the time, rather than effective solutions that work all of the time – solutions that prefer least energy expenditure, least number of parts etc. This approach also resonates with Occam’s razor. It is always advisable to have the least number of assumptions in your model. Another way to look at this is – the design with the least number of moving parts is always preferred.

The idea that true perceptions are not always advantageous may be counterintuitive. As complexity increases, we lack the perceptual network to truly comprehend the complexity. How we perceive our world around us depends a lot on our perceptual network, which is unique to our species. Our reality consists of omitting most of the attributes of the world around us. As Hoffman explains – the reality becomes simply a species-specific representation of fitness points on offer, and how we can act to get those points. Evolution has shaped us with perceptions that allow us to survive. But part of that involves hiding from us the stuff we don’t need to know.

Complexity also favors this approach of viable solutions/fitter perceptions. Hoffman notes:

We find that increasing the complexity of objective reality, or perceptual systems, or the temporal dynamics of fitness functions, increases the selection pressures against veridical perceptions.

I will add more thoughts on the FBT theorem at a later time. I encourage the readers to check out Hoffman’s book, The Case Against Reality.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was Talking about Constraints in Cybernetics:

Talking about Constraints in Cybernetics:

This is available as part of a book offering that is free for community members of Cyb3rSynLabs. Please check here (https://www.cyb3rsynlabs.com/c/books/) for Second Order Cybernetics Essays for Silicon Valley. The e-book version is available here (https://www.cyb3rsyn.com/products/soc-book)

In case you missed it, my last post was Deconstructing Systems – There is Nothing Outside the Text:

Deconstructing Systems – There is Nothing Outside the Text:

In today’s post, I am looking at ideas of the famous Algerian-French philosopher, Jacques Derrida. Derrida is often described as a post-structuralist philosopher. His most famous idea is deconstruction. Deconstruction is often associated with analyzing literary works. The basic notion of deconstruction can be loosely explained as when a text is produced, the author dies, and the reader is born. A text is presented as a coherent whole with a basic idea in the center. The language in the text is all about the idea in the center. The assumption is that the central idea has a fixed meaning. The point of deconstruction is then to disturb this coherent whole, and challenge the hierarchy of the coherent whole. The intent of deconstruction is discovery; the discovery of what is hidden behind the elaborate plot to stage the central idea. It is an attempt to subvert the dominant theme.

Deconstruction is taking the text apart to understand the structure of the text as it is written, and to determine the meaning in several different ways by challenging the hierarchy put in focus by the author. Derrida believed that in language we always prefer hierarchies. We prefer good over bad, or day over night etc. Most often this behavior of focusing on hierarchies results in believing them to be the ultimate truth. We tend to think in terms of false dichotomies. It has to be “this” or “that”. If I don’t do “this”, I am “bad”. Deconstruction always pushes us to look at it from another side or perspective. Deconstruction challenges the notion that language is a closed system – that the meaning is fixed. Derrida viewed language to be an open system, where meaning is not fixed and can depend on the context, the culture and the social realm in which it was constructed. Every perspective is an attempt to focus on certain ideas. But in the act of doing this, we are forced to ignore certain other ideas. The act of deconstruction is an attempt to look at the ideas that lay concealed in the text.

Another important idea that Derrida put forward was differance. Derrida came up with this as a play on words. Derrida is putting two different ideas together into one word. The two different ideas are that of difference (how one word get its meaning by being different to another), and deference (how the meaning of a word is provided in terms of yet more words). The idea of differance is that the complete meaning is always deferred (postponed) and is also differential. The dictionary is a great example to explain differance. The meaning of a word is given in terms of other words. The meaning of those words is given in terms of yet another set of words, and so on.

Derrida’s most famous quotation is – Il n’y a pas de hors-texte. This is often translated as “There is nothing outside the text.” This idea is misrepresented as all ideas are contained in language and that you cannot go outside the language. Derrida was not saying this. A better translation is – There is no outside-text. Here the outside-text refers to an inset in a book, something that is provided in a book as a supplement to provide clarity. We can see this as an outside authority trying to shed light on the book. Derrida is saying that there is no such thing. The meaning is not fixed, and what is presented as a closed system is actually an open system. We have to understand the historicity and context of the text to gain better understanding. Derrida is inviting us to feel the texture of text. As Alex Callinicos explained it:

Derrida wasn’t, like some ultra-idealist, reducing everything to language (in the French original he actually wrote ‘Il n’y a pas de hors-texte’ – ‘There is no outside-text’). Rather he was saying that once you see language as a constant movement of differences in which there is no stable resting point, you can no longer appeal to reality as a refuge independent of language. Everything acquires the instability and ambiguity that Derrida claimed to be inherent in language.

Derrida says that every text deconstructs themselves. Every text has contradictions, and the author has written the text in a forceful manner to stay away from the internal contradictions. Derrida is inviting us to challenge the coherence of text by pulling on the central idea and supplementing it to distort the balance. Paul Ricoeur wonderfully explained deconstruction as an act that uncovers the question behind the answers already provided in the text. The answers are already there, and our job then is to find the questions. We cannot assume that we have understood the entire meaning of the text. We have to undo what we have learned and try to feel the texture of the relations of the words to each other in the text.

Derrida was influenced by the ideas of Ferdinand de Sassure, who was a pioneer of a movement called Structuralism. Structuralism presents language as a self-enclosed system in which the important relationships are not those between words and the real objects to which they refer, but rather those internal to language and consisting in the interrelations of signifiers. Ferdinand de Sassure stated that in language, there are only differences. Derrida went a step further this. He challenged the idea of the continuous movement of differences and postponement of meaning that came as a result of structuralism. Callinicos explained this beautifully:

There is no stable halting point in language, but only what Derrida called ‘infinite play’, the endless slippages through which meaning is sought but never found. The only way to stop this play of difference would be if there were what Derrida called a ‘transcendental signified’ – a meaning that exists outside language and that therefore isn’t liable to this constant process of subversion inherent in signification. But the transcendental signified is nothing but an illusion, sustained by the ‘metaphysics of presence’, the belief at the heart of the western philosophical tradition that we can gain direct access to the world independently of the different ways in which we talk about and act on it…

He (Derrida) believed that it was impossible to escape the metaphysics of presence. Meaning in the shape of the ‘transcendental signified’ may be an illusion, but it is a necessary illusion. Derrida summed this tension up by inventing the word ‘differance’, which combines the meanings of ‘differ’ and ‘defer’. Language is a play of differences in which meaning is endlessly deferred, but constantly posed. The idea of differance informed Derrida’s particular practice of philosophy, which he called deconstruction. The idea was to scrutinize texts – particularly philosophical classics – to expose both how they participated in the metaphysics of presence and also the flaws and tensions through which the limitations of this way of thinking were revealed. As a result, these texts would end up very different from how they had seemed when Derrida started on them: they would have been dismantled – deconstructed.

Deconstructing Systems:

At this point, I will look at deconstructing Systems. The idea of a System is very much aligned to the ideas of Structuralism. A system is viewed as a whole with interconnected parts working together. The focus is on the benefit of the whole. The whole is the central idea of Systems Thinking. The whole is said to be more than the sum of its parts. The parts must be sub-servient to the whole.

When we approach systems with the ideas of deconstruction, we realize that every system is contingent on who is observing the system. There is no system without an observer. This makes all systems to be human systems. We have to consider the role of the observer and the impossibility of an objective world. As the famous Cybernetician, Klaus Krippendorff said – whatever is outside our nervous system is accessible only through our nervous system, and cannot be observed directly and separated from how that nervous system operates. We may refer to and talk about the same “system.” However, what constitutes the system, its complexity and what we desire its purpose to be all are dependent on the observer. All systems are constructed in a social realm. After all, meaning is assigned in the social realm, where we bring forth the world together through “languaging.” What the whole is and whether a part should be subservient to the whole depends upon who constructs the system as a mental construct to make sense of the world. If you consider the healthcare system, what it means and what it should do depends on who you talk to. If you talk to the healthcare provider or the insurance company or the patient, you would get different answers as to what the healthcare system means and what it should be doing. There is no one objective healthcare system. We can all identify the parts, but what the “system” means cannot be objectively identified. We must look at this from different perspectives to challenge the metanarratives. We should welcome multiple perspectives. Every perspective reveals certain attributes that were hidden before; the process of which knowingly or unknowingly requires hiding certain other attributes. From the discussion, we might say that – The center does not hold in systems.

There are many similarities between the hard systems approach of Systems Thinking and Structuralism. We talk of systems as if they are real and that everyone can objectively view and understand it. Gavin. P. Hendricks sheds some light on this:

Structuralism argues that the structure of language itself produces ‘reality’. That homo sapiens (humans) can think only through language and, therefore, our perceptions of reality are determined by the structure of language. The source of meaning is not an individual’s experiences or being but signs and grammar that govern language. Rather than seeing the individual as the center of meaning, structuralism places the structure at the center. It is the structure that originates or produces meaning, not the individual self. Meaning does not come from individuals but from the socially constructed system that governs what any individual can do.

Derrida’s ideas obviously rejected the notions put forth by Structuralism. Derrida’s ideas support pluralism. There is no outside-text doesn’t mean that there is no text for us to process. It means that the text can be interpreted in multiple meaningful ways. And of course, this does not mean that all of them are valid. This would be the idea of relativism. As Derrida said, meaning is made possible by relations of words to other words within the network of structures that language is. The different meanings generated through deconstruction (pluralism) are meaningful to those who generated them. This idea is something that we need to bring back into “the front” of Systems Thinking. Derrida invites us to dissolve the hierarchy of the whole in the system that you have created, and look at the part that you have marginalized in your system. When we view the part from another perspective, we suddenly realize that the center of our system does not align with the center of the new different view.

I will finish with wise words from Richard Rorty:

There is nothing deep down inside us except what we have put there ourselves.

The corollary of course is- there is nothing out there giving us meaning or purpose, except that which we have constructed ourselves.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was When a Machine Breaks…:

Notes on The Good Regulator Theorem:

In case you missed it, my last post was Pluralism and Systems Thinking:

The Contingency and Irony of Systems and Cybernetics Thinking:

In today’s post, I am using the ideas of the great American pragmatist philosopher, Richard Rorty. Rorty’s most famous work is Contingency, Irony and Solidarity. Rorty as a pragmatist follows the idea of an anti-essentialist. This basically means that there is no intrinsic essence to a phenomenon. Take for example, the idea of “Truth”. The general notion of Truth is that it can be found independent of human cognition. Rorty points out that this idea is not at all useful.

Rorty states:

Truth cannot be out there – cannot exist independently of the human mind – because sentences cannot so exist, or be out there. The world is out there, but descriptions of the world are not. Only descriptions of the world can be true of false. The world on its own – unaided by the describing activities of human beings – cannot.

The suggestion that truth, as well as the world, is out there is a legacy of an age in which the world was seen as the creation of a being who had a language of his own.

A key idea that Rorty brings up is the contingency of language. We may see language as this wonderful thing that enables us to communicate. Rorty describes language as contingent. This means that language is actually something we invented rather than discovered. And that language is really a tool we use to describe what is around us and our ideas. It is contingent because it is historically and geographically based. It is also contingent because we are engaged in language games, and meaning is an emergent phenomenon from our language games. This idea of language games is inspired by Ludwig Wittgenstein. If we see language as contingent, then we can prepare ourselves to not fall prey to the idea that truth is out there in the world, and that it is something that we can find. When we realize that language is contingent, we stop believing in dogmas and doctrines stipulated to us. We stop asking questions such as “What is it to be a human being?” Instead we ask, “What is it to inhabit a twenty first century democratic society?”

The idea of contingency slowly reveals us that sentences are no longer important. We should focus on vocabularies. Rorty explains that vocabularies allow us describe and re-describe the world. It is a holistic notion. When the notion of a “description of the world” is moved from the level of criterion-governed sentences within language games to language games as wholes, games which we do not choose between by reference to criteria, the idea that the world decides which descriptions are true can no longer be given a clear sense. It becomes hard to think that, that vocabulary is somehow already out there in the world, waiting for us to discover it. Languages are made rather than found, and truth is a property of linguistic entities (sentences).

As a pragmatist, Rorty’s view is that language, and in turn vocabulary, is a tool that is useful in a particular context. It does not have an intrinsic nature on its own because it is contingent on us, the users. Rorty wonderfully explains this as – the fact that Newton’s vocabulary lets us predict the world more easily than Aristotle’s does not mean that the world speaks Newtonian.

Another idea that Rorty proposes is that of the final vocabulary. Rorty says that we all have final vocabularies. All human beings carry about a set of words which they employ to justify their actions, their beliefs, and their lives. These are the words in which we formulate praise for our friends and contempt for our enemies, our long-term projects, our deepest self-doubts and our highest hopes… It is “final” in the sense that if doubt is cast on the worth of these words, their user has no noncircular argumentative recourse. Those words are as far as he can go with language; beyond them there is only helpless passivity or a resort to a force. A small part of a final vocabulary is made up of thin, flexible, and ubiquitous terms such as “true,” “good,” “right,” and “beautiful. ” The larger part contains thicker, more rigid, and more parochial terms, for example, “Christ,” “England,” “professional standards,” “decency,” “kindness,” “the Revolution,” “the Church,” “progressive,” “rigorous,” “creative.” The more parochial terms do most of the work.

Let’s look at what we have discussed so far and look at systems thinking. Pragmatism is not foreign to systems thinking. The pioneer of soft systems approach, C. West. Churchman was a pragmatist. He advised us that systems approach starts when we view the world through the eyes of another. The general commonsense view of systems is that they are real, and everyone sees the “system” objectively which helps to address the problem. The “system” can be drawn and described accurately. The system can be optimized to achieve maximum performance. This is the “hard systems” approach which utilizes a mechanistic view. However, as we start applying the pragmatist ideas we have looked at, we start to challenge this. “Systems” are not real entities but mental constructs by an observer to aid in understanding of a phenomenon of interest. “Systems” no longer become a necessity, but become contingent on the observer constructing it. When one says that the “healthcare system” is broken, we no longer look at the sentence in isolation, but rather we start looking at the vocabularies. The idea of contingency brings the non-objective nature of reality into the front. How one sees or experiences something depends on his or her contingency and their final vocabulary. From this standpoint, a system has nothing that the observer does not put into it. The intrinsic nature of a system is actually the properties assigned by the observer and contingent on his or her final vocabulary.

Similar ideas are present in Cybernetics and Systems Thinking:

We exist in language using language for our explanations- Humberto Maturana

The environment as we perceive it is our invention. – Heinz von Foerster

If contingency of language is an issue, then how does one do systems thinking then? Here I will introduce another idea from Rorty. This is the idea of an “ironist”. Rorty said:

I shall define an “ironist” as someone who fulfills three conditions : ( 1 ) She has radical and continuing doubts about the final vocabulary she currently uses, because she has been impressed by other vocabularies, vocabularies taken as final by people or books she has encountered; (2) she realizes that argument phrased in her present vocabulary can neither underwrite nor dissolve these doubts ; (3 ) insofar as she philosophizes about her situation, she does not think that her vocabulary is closer to reality than others, that it is in touch with a power not herself. Ironists who are inclined to philosophize see the choice between vocabularies as made neither within a neutral and universal metavocabulary nor by an attempt to fight one’s way past appearances to the real, but simply by playing the new off against the old.

Rorty adds:

The ironist spends her time worrying about the possibility that she has been initiated into the wrong tribe, taught to play the wrong language game. She worries that the process of socialization which turned her into a human being by giving her a language may have given her the wrong language, and so turned her into the wrong kind of human being. But she cannot give a criterion of wrongness. So, the more she is driven to articulate her situation in philosophical terms, the more she reminds herself of her rootlessness by constantly using terms like “Weltanschauung,” “perspective,” “dialectic,” “conceptual framework, “historical epoch,” “language game,” “redescription,” “vocabulary,” and “irony.”

From a second order Cybernetics standpoint, the idea of an ironist is self-referential. The observer is aware of their final vocabulary. Moreover, they are aware that their final vocabulary is perhaps incomplete or incorrect. They are historicist in the sense they understand that their language is contingent based on the time, place and society they were born into. They are also aware that others do not share their vocabulary. From this standpoint, what they can do is to seek understanding and ask leading questions to expose others to their contingencies of their vocabulary. They understand that truth is a function of agreement within language games. They don’t look at sentences in isolation, but at vocabularies in a holistic fashion. They realize that ideas are dynamic and do not have a fixed essence because vocabularies themselves are dynamic. They are open to changing their vocabularies without the fear of going against ideas they once held on to. They understand in a pragmatist sense that all models are wrong but the practical question is how wrong do they have to be to not be useful. (George Box)

I will finish with a quote from Fredrich Nietzsche:

“Truths are illusions about which one has forgotten that this is what they are; metaphors which are worn out and without sensuous power; coins which have lost their pictures and now matter only as metal, no longer as coins.”

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was Cybernetic Explanation, Purpose and AI:

Cybernetic Explanation, Purpose and AI:

In today’s post, I am following the theme of cybernetic explanation that I talked about in my last post – The Monkey’s Prose – Cybernetic Explanation. I recently listened to the talks given as part of the Tenth International Conference on Complex Systems. I really enjoyed the keynote speech by the Herb. A. Simon award winner, Melanie Mitchell. She told the story of a project that her student did where the AI was able to recognize whether there was an animal in a picture or not with good accuracy. Her student dug deep into the AI’s model. The AI is taught to identify a characteristic by showing a large number of datasets (in this case pictures with and without animals). The AI is shown which picture has an animal and which picture does not. The AI comes up with an algorithm based on the large dataset. The correct answers reinforce the algorithm, and the wrong answers tweaks the algorithm as needed with the assigned weights to the “incorrectness”. This is very much like how we learn. What Mitchell’s student found was that the AI is assigning probabilities based on whether the background is blurry or not. When the background is blurry, it is more likely that there is an animal in the picture. In other words, it is not looking for an animal, it is just looking to see whether the background is blurry or not. Depending upon the statistical probability, the AI will answer that there is or there is not an animal in the picture.

We, humans, assign the meaning to the AI’s output, and believe that the AI is able to differentiate whether there is an animal in the picture or not. In actuality, the AI is merely using statistical probabilities of whether the background is blurry or not. We cannot help but assign meanings to things. We say that nature has a purpose, or that evolution has a purpose. We assign causality to phenomenon. It is interesting to think about whether it truly matters that the AI is not really identifying the animal in the picture. The outcome still has the appearance that the AI is able to tell whether there is an animal or not in the picture. We are able to bring in more concepts that the AI cannot. Mitchell discusses the difference between concepts and perceptual categories. What the AI is doing is constructing perceptual categories that are limited in nature, whereas what we construct are concepts that may be linked to other concepts. The example that Mitchell provided was that of a bridge. For us, a bridge can mean many things based on the linguistic application. We can say that a person is able to “bridge the gap” or that our nose has a bridge. The capacity for AI, at this time at least, is to stick to the bridge being a perceptual category based on the context of the data it has. We can talk in metaphors that the AI cannot understand. A bridge can be a concept or an actual physical thing for us. For a simple task such as the question of an animal in the picture carries no risk. However, as we up the ante to a task such as autonomous driving, we can no longer rely on the appearances of the AI being able to carry out the task. This is demonstrated in the morality or ethics debate with regards to AI, and how it should carry out probability calculations in the event of a hazard. This involves questions such as the ones in the trolley problem.

This also leads to another idea that has the cybernetic explanation embedded in it. This is the idea of “do no harm”. The requirement is not specifically to do good deeds, but to not do things that will cause harm to others. As the English philosopher, John Stuart Mill put it:

That the only purpose for which power can be rightfully exercised over any member of a civilized community, against his will, is to prevent harm to others.

This is also what Isaac Asimov referred to as the first of the three laws of robotics in his 1942 short story, Runaround:

A robot may not injure a human being or, through inaction, allow a human being to come to harm.

The other two laws are circularly referenced to the first law:

2. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.

3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

The idea of cybernetic explanation gives us another perspective to purpose and meaning. Our natural disposition is to assign meaning and purpose, as I indicated earlier. We tend to believe that Truth is out there or that there is an objective reality. As the great Cybernetician Heinz von Foerster put it – “The environment contains no information; the environment is as it is”. Truth or descriptions of reality is our creation with our vocabulary. And most importantly, there are other beings describing realities with their vocabularies as well. I will finish with some wise words from Friedrich Nietzsche.

“It is we alone who have devised cause, sequence, for-each-other, relativity, constraint, number, law, freedom, motive, and purpose; and when we project and mix this symbol world into things as if it existed ‘in itself’, we act once more as we have always acted—mythologically.”

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was The Monkey’s Prose – Cybernetic Explanation:

The Monkey’s Prose – Cybernetic Explanation:

Imagine that you are on your daily walk in the park. You see a monkey on a park bench, busily typing away. You become curious as to what is happening. You slowly approach him from behind, and try to see what is being typed on the paper. Strange enough, what you see typed on the paper so far is legible prose; complete with grammar and semantics. What could be an explanation for this phenomenon?

This example was given by the great anthropologist cybernetician, Gregory Bateson. He used the example to explain “cybernetic explanation”, as he termed it. He said:

Causal explanation is usually positive. We say that billiard ball B moved in such and such a direction because billiard ball A hit it at such and such an angle. In contrast to this, cybernetic explanation is always negative… In cybernetic language, the course of events is said to be subject to restraints, and it is assumed that, apart from such restraints, the pathways of change would be governed only by equality of probability. In fact, the “restraints” upon which cybernetic explanation depends can in all cases be regarded as factors which determine inequality of probability If we find a monkey striking a typewriter apparently at random but in fact writing a meaningful prose, we shall look for restraints, either inside the monkey or inside the typewriter… Somewhere there must have been a circuit which could identify error and eliminate it.

Bateson’s use of the word “restraints” is comparable to “constraints”. Larry Richards notes that Bateson used the term “restraint” referring to the approach of Cybernetics as “negative explanation”, focusing on what is not desirable, rather than what is. When there are no constraints, we can say that all events are equally likely. If we have enough chances, we will see at least one event, where a monkey can type out a work of Shakespeare (sometimes referred to as Infinite Monkey theorem). But here, we are looking at cybernetic phenomenon where constraints are present, and they guide the outcome. In the case of the monkey’s prose, one possibility could be that the typewriter is programmed in such a fashion that no matter what key is pressed, a preprogrammed prose is generated. This would be an example of a circuit that Bateson referred to.

Let’s consider another example. Let’s say that every hour you take two measurements, measurement A and measurement B. What you find is that measurement A goes up and down, while measurement B remains fairly steady. From this dataset, what correlation can you determine between A and B? Without any additional knowledge, the general consensus would that there is no correlation between the two measurements. What if we consider the mechanism of a thermostat? The thermostat does not turn ON until the temperature goes outside a tight range. Only when the temperature goes outside the range does the thermostat turn ON. It maintains the internal temperature of the house based on how the external temperature impacts the internal temperature. In the example above, the external temperature was A and the internal temperature was B. Without a knowledge of thermostat, if we were given just the two datasets, we would not be able to see any connection between the two datasets. This idea is sometimes referred to Friedman’s Thermostat after the American economist, Milton Friedman.

The thermostat is a very basic example of cybernetic explanation. Even though, we may perceive that the thermostat’s goal is to maintain the room temperature at a constant value, the thermostat does not have a goal per se. It does not stay ON to ensure that the temperature is maintained at a constant value. Instead, it turns ON when the temperature goes outside a limit. The thermostat negatively “moves away” from the outside range value of the temperature and stays ON until it is inside a determined range. The thermostat acts only when it hits a constraint or it is guided by the restraint, to use Bateson’s language. It is not a movement towards a goal temperature of say 70 degrees F, but rather a movement away from a current temperature of say 68 degrees F. Larry Richards explained this wonderfully:

Any system with constraints appears to have a purpose as there are outcomes precluded from the set of possibilities.

Another example we can consider is that of driving a car. When you drive a car, you apply gas or brake only when needed. You don’t steer the car to try to keep it running in a straight line. You engage when the car is moving towards the edges of your lane. To continuously work towards a goal requires high energy, and a person driving is not suitable for this.

This idea of cybernetic explanation brings forth valuable insights when we look at social systems such as an organization. Richards proposes that assigning or designing a purpose for a social system can lead to problems.

I suggest avoiding or suspending… the idea of purpose. The idea of teleological systems – that systems have a purpose first, with structure following – implies that systems are created or evolve to achieve a goal or objective.

The problem in Second Order Cybernetics arises when the observers/designers specify the purpose of their designs, giving conscious intent to their actions. Gregory Bateson (1972a, 1972b) warned of the dysfunctions of conscious purpose when the actions taken do not and cannot account for all the ecological circularities of the situation and the unanticipated consequences inherent in taking such actions. Yet, humans have needs, desires, preferences and values; we are self-aware of our actions and alternatives; and, we can act with intent to satisfy our needs and desires. To act without self-awareness of our desires and the possible consequences of our actions would be irresponsible.

Richards advises to look for present constraints that guide actions.

Specifying a set of constraints treats desires as a spatial concept, focusing attention on the states we wish to exclude from happening, leaving open a space of possible outcomes deemed currently acceptable. This approach is present-oriented, merging ends and means: the set of constraints that represent our desires and the actions we take to avoid what we do not want are here and now, and our evaluation of possible consequences is based on current best available knowledge. Our desires, actions and evaluations can change as we experiment, learn and change, making it important to be careful about excluding outcomes that could become useful as circumstances change. Treating desires as constraints and intention as an awareness of desires as constraints opens the door for an alternative to the consciousness of purpose about which Bateson was concerned.

The idea of cybernetic explanation and constrains raise the importance of dialogue amongst the coparticipants of the social realm. Rather than going after a narrow purpose, we may be better served if we can explore the space of constraints to identify conditions that promote outcomes that we desire. When we utilize a constancy of purpose, we are utilizing a narrow view that is not able to accommodate the various interpretations and desires of the many coparticipants of our social realm. Bateson viewed the pursuit of conscious purpose as being damaging to the very ecology that supports being human. (Klaus Krippendorff). Krippendorff came out with an Empirical Imperative to support this idea:

Empirical Imperative: Invent as many alternative constructions as you can and actively explore the constraints on their affordances.

I will finish with more wise words from Richards that provides further insights about cybernetic explanation:

If I know what I want and I know it is possible to achieve it, I do not need cybernetics—I just go and do what I need to do to achieve the outcome. However, when I only have a vague idea about what I want or do not want and I do not know how to pursue or avoid it in the current society, the vocabulary of cybernetics can be useful. Cybernetics is not about success and the achievement of goals; it is about the reconfiguration of constraints (resources) in order to make possible what was not previously possible, including the avoidance of what was previously inevitable.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was Complexity – Only When You Realize You Are Blind, Can You See:

Complexity – Only When You Realize You Are Blind, Can You See:

In today’s post, I am looking at the idea of complexity from a second order Cybernetics standpoint. The phrase “only when you realize you are blind, can you see”, is a paraphrase of a statement from the great Heinz von Foerster. I have talked about von Foerster in many of my posts, and he is one of my heroes in Cybernetics. There is no one universally accepted definition for complexity. Haridimos Tsoukas and Mary Jo Hatch wrote a very insightful paper called “Complex Thinking, Complex Practice”. In the paper, they try to address how to explain complexity. They refer to the works of John Casti and C. H. Waddington to further their ideas:

Waddington notes that complexity has something to do with the number of components of a system as well as with the number of ways in which they can be related… Casti defines complexity as being ‘directly proportional to the length of the shortest possible description of [a system]’.

Casti’s views on complexity are particularly interesting because complexity is not viewed as being intrinsic to the phenomenon. This is a common idea in Cybernetics, mainly second order cybernetics. There are two ‘classifications’ of cybernetics – first order and second order cybernetics. As von Foerster explained it, first order cybernetics is the study of observed systems, where the basic assumption is that the system is objectively knowable. The second order cybernetics is the study of observing systems, where the basic assumption is that the observer is included in the act of observing, and thus the observer is part of the observed system. This leads to second order thinking such as understanding understanding or observing observing. It is interesting because, as I am typing, Microsoft Word is telling me that “understanding understanding” is syntactically incorrect. This obviously would be a first order viewpoint. The second order cybernetics is a meta discipline and one that generates wisdom.

When we take the observer into consideration, we realize that complexity is in the eyes of the beholder. Complexity is observer-dependent; that is, it depends upon how the system is described and interpreted. If the observer is able to make more varying distinctions in their description, we can say that the phenomenon or the system is being interpreted as complex. In their paper, Tsoukas and Jo Hatch brings up the ideas of language in describing and thus interpreting complexity. They note that:

Chaos and complexity are metaphors that posit new connections, draw our attention to new phenomena, and help us see what we could not see before (Rorty).

This is quite interesting. When we learn the language of complexity, we are able to understand complexity better, and we become better at describing it, in a reflexive manner.

What complexity science has done is to draw our attention to certain features of systems’ behaviors which were hitherto unremarked, such as non-linearity, scale-dependence, recursiveness, sensitivity to initial conditions, emergence (etc.)

From this standpoint, we can say that complexity lies in the interactions we have with the system, and depending on our perspectives (vantage point) and the interaction we can come away with a different interpretation for complexity.

Heinz von Foerster remarked that complexity is not in the world but rather in the language we use to describe the world. Paraphrasing von Foerster, cognition is computation of descriptions of reality. Managing complexity then becomes a cognitive task. How well you can interact or manage interactions depends on how effective your description is and how well it aligns with others’ descriptions. The complexity of a system lies in the description of that system, which entirely rests on the observer/sensemaker. The idea that complexity is in the eyes of beholder is to point out the importance of second order cybernetics/thinking. The world is as it is, it gets meaning only when we assign meaning to it through how we describe/interpret it. To put differently, “the logic of the world is the logic of the descriptions of the world” (Heinz von Foerster)

The idea of complexity not being intrinsic to a system is also echoed by one of the pioneers of cybernetics, Ross Ashby. He noted – “a system’s complexity is purely relative to a given observer; I reject the attempt to measure an absolute, or intrinsic, complexity; but this acceptance of complexity as something in the eye of the beholder is, in my opinion, the only workable way of measuring complexity”.

The ideas of second order cybernetics emphasize the importance of observers. The “system” is a mental construct by an observer to make sense of a phenomenon. The observer based on their needs draw boundaries to separate a “system” from its environment. This allows the observer to understand the system in the context of its environment. At the same time, the observer has to understand that there are other observers in the same social realm who may draw different boundaries and come out with different understandings based on their own needs, biases, perspectives etc.

A phenomenon can have multiple identities or meanings depending on who is describing the desired phenomenon. Let’s use the Covid 19 pandemic as an example. For some people, this has become a problem of economics rather than a healthcare problem, while for some others it has become a problem of freedom or ethics. If we are to attempt tackling the complexity of such an issue, the worst thing we can do is to attempt first order thinking- the idea that the phenomenon can be observed objectively. Issues requiring second order approach get worse by the application of first order methodologies. The danger in this is that we can fall prey to our own narrative being the whole Truth.

As the pragmatic philosopher Richard Rorty points out:

The world does not speak. Only we do. The world can, once we have programmed ourselves with a language, cause us to hold beliefs. But it cannot propose a language for us to speak. Only other human beings can do that.

If we are to understand complexity of a phenomenon, we need to start with realizing that our version of complexity is only one of the many. Our ability to understand complexity depends on our ability to describe it. We lack the ability to completely describe a phenomenon. The different descriptions that come about from the different participants may be contradictory and can point out apparent paradoxes in our social realm.

In complexity, if we are to tackle it, we need to have coherence of multiple interpretations. As Karl Weick points out, we need to complicate ourselves. By generating and accommodating multiple inequivalent descriptions, practioners will increase the complexity of their understanding and, therefore, will be more likely to match the complexity of the situation they attempt to manage. In complexity, coherence – the idea of connecting ideas together, is important since it helps to create a clearer picture and affords avoiding blind spots. This co-construted description itself is an emergent phenomenon.

In second order Cybernetics, there are two statements that might shed more light on everything we have discussed so far:

Anything said is said BY an observer. (Maturana)

Anything said is said TO an observer. (von Foerster)

A lot can be said between these two statements. The first points out that the importance of the observer, and the second points out that there are other observers, and we coconstruct our social reality.

Our descriptions are abstractions since we are limited by our languages. All our biases, fears, misunderstandings, ignorance etc. lie hidden in the “systems” we construct. We get into trouble when we assume that these abstractions are real things. This is the first order approach, where we are not aware that we do not see due to our cognitive blind spots. When we realize that all we have are abstractions, we get to the second order approach. We include ourselves in our observation and we start looking at how we make these abstractions. We also become aware of other autonomous participants of our social reality engaging in similar constructions of narratives. As we seek their understanding, we become aware of our cognitive blind spots. We realize that everything is on a spectrum, and our thinking of “either/or” is actually a false dichotomy.

At this point, Heinz von Foerster would say that we start to see when we realize that we are blind.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was Causality and Purpose in Systems:

Harish's Notebook – My notes… Lean, Cybernetics, Quality & Data Science.

Menu

Category Archives: Complexity

The Maximum Entropy Principle:

Destruction of Information/The Performance Paradox:

The Truth About True Models:

Talking about Constraints in Cybernetics:

Deconstructing Systems – There is Nothing Outside the Text:

Notes on The Good Regulator Theorem:

The Contingency and Irony of Systems and Cybernetics Thinking:

Cybernetic Explanation, Purpose and AI:

The Monkey’s Prose – Cybernetic Explanation:

Complexity – Only When You Realize You Are Blind, Can You See: