In today’s post, I am looking at the Socrates of Cybernetics, Heinz von Foerster’s ethical imperative:
“Always act so as to increase the number of choices.”
I see this as the recursive humanist commandment. This is very much applicable to ethics, and how we should treat each other. Von forester said the following about ethics:
Whenever we speak about something that has to do with ethics, the other is involved. If I live alone in the jungle or in the desert, the problem of ethics does not exist. It only comes to exist through our being together. Only our togetherness, our being together, gives rise to the question, How do I behave toward the other so that we can really always be one?
Von Foerster’s views align with that of constructivism, the idea that we construct our knowledge about our reality. We construct our knowledge to “re-cognize” a reality through the intercorrelation of the activities of the various sense organs. It is through these computed correlations that we recognize a reality. No findings exist independently of observers. Observing systems can only correlate their sense experiences with themselves and each other.
Paul Pangaro reminded me that von Foerster did not mean “options” or “possibilities”. Von Foerster specifically chose the word “choices”. By choices, he meant those selections among options that you might “actually take” depending on who “you are” right now. Here choices narrow down to the few that apply most to what you are now in this moment and in this context, down to a decision that makes you who you are. As von Foerster said, “Don’t make the decision, let the decision make you.” You and your choice you take are indistinguishable.
Since we are the ones doing the construction, we are also ultimately responsible for what we construct. No one should take this away from us. Ernst von Glasersfeld, father of radical constructivism explained this well:
The moment you begin to think that you are the author of your knowledge, you have to consider that you are responsible for it. You are responsible for what you are thinking, because it’s you who’s doing the thinking and you are responsible for what you have put together because it’s you who’s putting it all together. It’s a disagreeable idea and it has serious consequences, because it makes you truly responsible for everything you do. You can no longer say “well, that’s how the world is”, or “sono così”; you know, that’s not good enough.
Cybernetics is about communication and control in the animal and machine, as Norbert Wiener viewed it. When we view control in terms of von Foerster’s ethical imperative, interesting thoughts come about. Control is about reducing the number of choices so that only certain pre-selected activities are available for the one being controlled. For example, a steersman has to control their ship such that it maintains a specific course, and here the ship’s “available options” to move are drastically reduced. When we use this view of control and apply it to human beings, we should do so in light of von Foerster’s ethical imperative.
Von Foerster also said – A is better off when B is better off. This also provides further clarity on the recursiveness. If I am to make sure that I act so as to increase the number of choices for B, then B also in turn does the same. How I act impacts how others (re)act, which in turn impacts how I act back… on and on. This might remind the reader of the golden rule – Treat others as you would like others to treat you. However, this is missing the point about constructivism and the ongoing interaction that leads to the construction of a social reality. I see this as part of a social contract. As Jean-Jacques Rousseau noted, Man is born free, but everywhere he is in chains. The social contract comes about from the ongoing interactions and the contexts we are in with our fellow human beings as part of being in a society or social groups. This also means that this is dynamic and contingent in nature. What was “good” before may not be “good” today. This requires an ongoing framing and reframing though interactions.
John Boyd, father of OODA loop, shed more light on this:
Studies of human behavior reveal that the actions we undertake as individuals are closely related to survival, more importantly, survival on our own terms. Naturally, such a notion implies that we should be able to act relatively free or independent of any debilitating external influences — otherwise that very survival might be in jeopardy. In viewing the instinct for survival in this manner we imply that a basic aim or goal, as individuals, is to improve our capacity for independent action. The degree to which we cooperate, or compete, with others is driven by the need to satisfy this basic goal. If we believe that it is not possible to satisfy it alone, without help from others, history shows us that we will agree to constraints upon our independent action — in order to collectively pool skills and talents in the form of nations, corporations, labor unions, mafias, etc — so that obstacles standing in the way of the basic goal can either be removed or overcome. On the other hand, if the group cannot or does not attempt to overcome obstacles deemed important to many (or possibly any) of its individual members, the group must risk losing these alienated members. Under these circumstances, the alienated members may dissolve their relationship and remain independent, form a group of their own, or join another collective body in order to improve their capacity for independent action.
In a similar fashion, Dirk Baecker also noted the following:
Control means to establish causality ensured by communication. Control consists in reducing degrees of freedom in the self-selection of events. This is why the notion of “conditionality” is certainly one of the most important notions in the field of systems theory. Conditionality exists as soon as we introduce a distinction which separates subsets of possibilities and an observer who is forced to choose, yet who can only choose depending on the “product space” he is able to see. If we assume observers on both sides of the control relationship, we end up with subsets of possibilities selecting each other and thereby experiencing, and solving, the problem of “double contingency” so much cherished by sociologists. In other words, communication is needed to entice observers into a self-selection and into the reduction of degrees of freedom that goes with it. This means there must be a certain gain in the reduction of degrees of freedom, which for instance may be a greater certainty in the expectation of specific things happening or not happening.
Ultimately, this is all about what we value for ourselves and for the society we are part of. Our personal freedom makes sense only in light of other’s personal freedoms. That is the context – in relation to another human being, one who may be less fortunate than us. Making the world easier for those less fortunate than us makes the world better for everyone of us. I will finish with a great quote from one of my favorite science fiction character, Doctor Who:
“Human progress isn’t measured by industry. It’s measured by the value you place on a life. An unimportant life. A life without privilege. The boy who died on the river, that boy’s value is your value. That’s what defines an age. That’s what defines a species.”
Please maintain social distance, wear masks and take vaccination, if able. Stay safe and always keep on learning…
I have written a lot about the problem of induction before. This was explained very well by the great Scottish philosopher, David Hume. Hume looked at the basis of beliefs that we hold such as:
The sun will rise tomorrow; or
If I drop this ball, it will fall to the ground
Hume noted that there is no uniformity in nature. In other words, it is not rational to believe that what has happened in the past will happen again in the future. Just because, we have seen the sun rise every single day of our lives, it does not guarantee that it will rise again tomorrow. We are using our experience of the sun rising to believe that it will rise again tomorrow. Even though, this might be irrational, Hume does not deny that we may see the belief of the sun rising as a sensible proposition. He notes:
None but a fool or madman will ever pretend to dispute the authority of experience, or to reject that great guide of human life.
It’s just that we cannot use logic to back this proposition up. We cannot conclude that the future is going to resemble the past, no matter how many examples of the past we have. We cannot simply use experience of the past because the only experience we have is of the past, and not of the future. Hume noted that to propose that the next future event will resemble the past because our most recent “future event” (the last experience event) resembled the past is circular:
All our experimental conclusions proceed upon the supposition that the future will be conformable to the past. To endeavor, therefore, the proof of this last supposition by probable arguments, or arguments regarding existence, must be evidently going in a circle, and taking that for granted, which is the very point in question.
Hume concluded that we fall prey to the problem of induction because we are creatures of habits:
For wherever the repetition of any act or operation produces a propensity to renew the same act or operation, without being impelled by any reasoning or process of the understanding, we always say, that this propensity is the effect of Custom. By employing this word, we pretend not to have given the ultimate reason of such a propensity. We only point out a principle of human nature, which is universally acknowledged, and which is well known by its effects.
In other words, it is our human nature to identify and seek patterns, use them to make predictions of the future. This is just how we are wired. We do this unconsciously. Our brains are prediction engines. We cannot help but do this. I will go further with this idea by utilizing a brilliant example from the wonderful American philosopher Charles Sanders Peirce. Peirce in 1868 wrote about an experiment to reveal the blind spot in the retina:
Does the reader know of the blind spot on the retina? Take a number of this journal, turn over the cover so as to expose the white paper, lay it sideways upon the table before which you must sit, and put two cents upon it, one near the left-hand edge, and the other to the right. Put your left hand over your left eye, and with the right eye look steadily at the left-hand cent. Then, with your right hand, move the right-hand cent (which is now plainly seen) towards the left hand. When it comes to a place near the middle of the page it will disappear—you cannot see it without turning your eye. Bring it nearer to the other cent, or carry it further away, and it will reappear; but at that particular spot it cannot be seen. Thus, it appears that there is a blind spot nearly in the middle of the retina; and this is confirmed by anatomy. It follows that the space we immediately see (when one eye is closed) is not, as we had imagined, a continuous oval, but is a ring, the filling up of which must be the work of the intellect. What more striking example could be desired of the impossibility of distinguishing intellectual results from intuitional data, by mere contemplation?
I highly encourage the reader to check this out, if they have not heard of this experiment. In fact, I welcome the reader to draw a line and then place the coin on the line. Doing so, the reader will see that the coin vanishes, however the line still remains visible in the periphery. This means that even though, our eye “sees” a ring, the brain actually fills it out and makes us see a “whole” picture. To add to this wonderful capability of our interpretative framework, the image that falls on our retina is actually upside-down. Yet, our brain makes it the “right-side” up. This would mean that newborn babies may actually see the world upside down and with voids, but at some point, the interpretative framework changes to correct it so that we see the world “correctly”.
How does our brain know to do this? The answer to this is that it was evolutionarily beneficial for our ancestors to do this, just like our custom to look for patterns. This is what Lila Gatlin would refer to as a D1 constraint. This is a context-free constraint that was evolutionarily passed down from generation to generation. This is a constraint that acts in any situation. In other words, to quote Alicia Juarrero, it is context free.
To go past this constraint, we have to use second order thinking. In other words, we have to think about thinking; we have to learn about learning; we have to look at understanding understanding. I welcome the reader to look at the posts I have written on this matter. I will finish with two quotes to further meditate on this:
Only when you realize you are blind, can you see. (Paraphrasing Heinz von Foerster)
The quieter you become, the more you can hear. – Ram Dass
Please maintain social distance, wear masks and take vaccination, if able. Stay safe and always keep on learning…
I have had some good conversations recently about epistemology. Today’s post is influenced by those conversations. In today’s post, I am looking at Bayesian epistemology, something that I am very influenced by. As the readers of my blog may know, I am a student of Cybernetics. One of the main starting points in Cybernetics is that we are informationally closed. This means that information cannot enter into us from outside. This may be evident for any teachers in my viewership. You are not able to open up a student’s brain and pour information in as a commodity and then afterwards seal it back up. What happens instead is that the teacher perturbs the student and the student in turn generates meaning out of the perturbation. This would also mean that all knowledge is personal. This is something that was taught by Michael Polanyi.
How we know something is based on what we already know. The obvious question at this juncture is what about the first knowledge? Ross Ashby, one of the pioneers of Cybernetics, has written that there are two main forms of regulations. One is the gene pattern, something that was developed over generations through the evolutionary process. An example of this is the impulse of a baby to grab or to breastfeed without any training. The second is the ability to learn. The ability to learn amplifies the chance of survival of the organism. In our species, this allows us to literally reach for the celestial bodies.
If one accepts that we are informationally closed, then one has to also accept that we do not have direct access to the external reality. What we have access to is what we make sense of from experiencing the external perturbations. Cybernetics aligns with constructivism, the philosophy that we construct a reality from our experience. Heinz von Foerster, one of my favorite Cyberneticians, postulated that our nervous system as a whole is organized in such a way (organizes itself in such a way) that it computes a stable reality. All we have is what we can perceive through our perception framework. The famous philosopher, Immanuel Kant, referred to this as the noumena (the reality that we don’t have direct access to) and the phenomena (the perceived representation of the external reality). We compute a reality based on our interpretive framework. This is just a version of the reality, and each one of us computes such a reality that is unique to each one of us. The stability comes from repeat interactions with the external reality, as well as with interactions with others. We do not exist in isolation from others. The more interactions we have the more we have the chance to “calibrate” it against each other.
With this framework, one does not start from ontology, instead one starts from epistemology. Epistemology deals with the theory of knowledge and ontology deals with being (what is out there). What I can talk about is what I know about rather than what is really out there.
Bayesian epistemology is based on induction. Induction is a process of reasoning where one makes a generalization from a series of observations. For example, if all the swans you have seen so far in your life are white swans, then induction would direct you to generalize that all swans are white. Induction assumes uniformity of nature, to quote the famous Scottish philosopher David Hume. This means that you assume that the future will resemble the past. Hume pointed out that induction is faulty because no matter how many observations one makes, one cannot assume that the future will resemble the past. We seek patterns in the world, and we make generalizations from them. Hume pointed out that we do this out of habit. While many people have tried to solve the problem of induction, nobody has really solved it.
All of this discussion lays the background for Bayesian epistemology. I will not go into the math of Bayesian statistics in this post. I will provide a general explanation instead. Bayesian epistemology puts forth that probability is not a characteristic of a phenomenon, but a statement about our epistemology. The probabilities we assign are not for THE reality but for the constructed reality. It is a statement about OUR uncertainty, and not about the uncertainty associated with the phenomenon itself. The Bayesian approach requires that we start with what we know. We start with stating our prior belief, and based on the evidence presented, we will modify our belief. This is termed as the “posterior” in Bayesian terms. Today’s posterior becomes tomorrow’s prior because “what we know now” is the posterior.
Another important thing to keep in mind is that one does not assign a 0 or 100% for your belief. Even if you see a coin with 10,000 heads in a row, you should not assume that the coin is double headed. This would be jumping into the pit of the problem of induction. We can keep updating our prior based on evidence without reaching 100%.
I will write more on this topic. I wanted to start off with an introductory post and follow up with additional discussions. I will finish with some appealing points of Bayesian epistemology.
Bayesian epistemology is self-correcting – Bayesian statistics has the tendency to cut down your overconfidence or underconfidence. The new evidence presented over several iterations corrects your over or under reach into confidence.
Bayesian epistemology is observer dependent and context sensitive – As noted above, probability in Bayesian epistemology is a statement of the observer’s belief. The framework is entirely dependent on the observer and the context around sensemaking. You do not remove the observer out of the observation. In this regard, Bayesian framework is hermeneutical. We bring our biases to the equation, and we put money where our mouth is by assigning a probability value to it.
Circularity – There is an aspect of circularity in Bayesian framework in that today’s prior becomes tomorrow’s posterior as noted before.
Second Order Nature – The Bayesian framework requires that you be open to changing your beliefs. It requires you to challenge your assumptions and remain open to correcting your belief system. There is an aspect of error correction in this. You realize that you have cognitive blind spots. Knowing this allows us to better our sensemaking ability. We try to be “less wrong” than “more right”.
Conditionality – The Bayesian framework utilizes conditional probability. You see that phenomena or events do not exist in isolation. They are connected to each other and therefore require us to look at the holistic viewpoint.
Coherence not Correspondence – The use of priors forces us to use what we know. To use Willard Van Orman Quine’s phrase, we have a “web of belief”. Our priors must make sense with all the other beliefs we already have in place. This supports the coherence theory of truth instead of the realist’s favorite correspondence theory of truth. I welcome the reader to pursue this with this post.
Consistency not completeness – The idea of a consistency over completeness is quite fascinating. This is mainly due to the limitation of our nervous system to have a true representation of the reality. There is a common belief that we live with uncertainty, but our nervous system strives to provide us a stable version of reality, one that is devoid of uncertainties. This is a fascinating idea. We are able to think about this only from a second order standpoint. We are able to ponder about our cognitive blind spots because we are able to do second order cybernetics. We are able to think about thinking. We are able to put ourselves into the observed.
I will finish with an excellent quote from Albert Einstein:
“As far as the laws of mathematics refer to reality, they are not certain; as far as they are certain, they do not refer to reality”.
Please maintain social distance, wear masks and take vaccination, if able. Stay safe and always keep on learning…
Please maintain social distance, wear masks and take vaccination, if able. Stay safe and always keep on learning… In case you missed it, my last post was The Open Concept of Systems:
In today’s post, I am following on the theme of Lila Gatlin’s work on constraints and tying it up with cybernetics. Please refer to my previous posts here and here for additional background. As I discussed in the last post, Lila Gatlin used the analogy of language to explain the emergence of complexity in evolution. She postulated that lower complex organisms such as invertebrates focused on D1 constraints to ensure that the genetic material is passed on accurately over generations, while vertebrates maintained a constant level of D1 constraints and utilized D2 constraints to introduce novelty leading to complexification of the species. Gatlin noted that this is similar to Shannon’s second theorem which points out that if a message is encoded properly, then it can be sent over a noisy medium in a reliable manner. As Jeremy Campbell notes:
In Shannon’s theory, the essence of successful communication is that the message must be properly encoded before it is sent, so that it arrives at its destination just as it left the transmitter, intact and free from errors caused by the randomizing effects of noise. This means that a certain amount of redundancy must be built into the message at the source… In Gatlin’s new kind of natural selection, “second-theorem selection,” fitness is defined in terms very different and abstract than in classical theory of evolution. Fitness here is not a matter of strong bodies and prolific reproduction, but of genetic information coded according to Shannon’s principles.
The codes that made possible the so-called higher organisms, Gatlin suggests, were redundant enough to ensure transmission along the channel from DNA to protein without error, yet at the same time they possessed an entropy, in Shannon’s sense of “amount of potential information,” high enough to generate a large variety of possible messages.
Gatlin viewed that complexity arose from the ability to introduce more variety while at the same time maintaining accuracy in an optimal mix, similar to human language where there is always constant emergence of new and new ideas while the main grammar, syntax etc. are maintained. As Campbell continues:
In the course of evolution, certain living organisms acquired DNA messages which were coded in this optimum way, giving them a highly successful balance between variety and accuracy, a property also displayed by human languages. These winning creatures were the vertebrates, immensely innovative and versatile forms of life, whose arrival led to a speeding-up of evolution.
As Campbell puts it, vertebrates were agents of novelty. They were able to revolutionize their anatomy and body chemistry. They were able to evolve more rapidly and adapt to their surroundings. The first known vertebrate is a bottom-dwelling fish that lived over 350 million years ago. They had a heavy external skeleton that anchored them to the floor of the water-body. They evolved such that some of the spiny parts of the skeleton grew into fins. They also evolved such that they developed skull with openings for sense organs such as eyes, nose, ears etc. Later on, some of them developed limbs from the bony supports of fins, leading to the rise of amphibians.
What kind of error-correcting redundancy did he DNA of these evolutionary prize winners, the vertebrates, possess? It had to give them the freedom to be creative, to become something markedly different, for their emergence was made possible not merely by changes in the shape of a common skeleton, but rather by developing whole new parts and organs of the body. Yet this redundancy also had to provide them with the constraints needed to keep their genetic messages undistorted.
Gatlin defined the first type of redundancy, one that allows deviation from equiprobability as ‘D1 constraint’. This is also referred to as ‘governing constraint’. The second type of redundancy, one that allows deviation from independence was termed by Gatlin as ‘D2 constraint’, and this is also referred to as ‘enabling constraint’. Gatlin’s speculation was that vertebrates were able to use both D1 and D2 constraints to increase their complexification, ultimately leading to a high cognitive being such as our species, homo sapiens.
One of the pioneers in Cybernetics, Ross Ashby, also looked at a similar question. He was looking at the biological learning mechanisms of “advanced” organisms. Ashby identified that for lower complex organisms, the main source of regulation is their gene pattern. For Ashby, regulation is linked to their viability or survival. He noted that the lower complex organisms can rely just on their gene pattern to continue to survive in their environment. Ashby noted that they are adapted because their conditions have been constant over many generations. In other words, a low complex organism such as a hunting wasp can hunt and survive simply based on their genetic information. They do not need to learn to adapt, they can adapt with what they have. Ashby referred to this as direct regulation. With direct regulation, there is a limit to the adaptation. If the regularities of the environment change, the hunting wasp will not be able to survive. It relies on the regularities of the environment for its survival. Ashby contrasted this with indirect regulation. With indirect regulation, one is able to amplify adaptation. Indirect regulation is the learning mechanism that allows the organism to adapt. A great example for this is a kitten. As Ashby notes:
This (indirect regulation) is the learning mechanism. Its peculiarity is that the gene-pattern delegates part of its control over the organism to the environment. Thus, it does not specify in detail how a kitten shall catch a mouse, but provides a learning mechanism and a tendency to play, so that it is the mouse which teaches the kitten the finer points of how to catch mice.
The learning mechanism in its gene pattern does not directly teach the kitten to hunt for the mice. However, chasing the mice and interacting with it, trains the kitten how to catch the mice. As Ashby notes, the gene pattern is supplemented by the information supplied by the environment. Part of the regulation is delegated to the environment.
In the same way the gene-pattern, when it determines the growth of a learning animal, expends part of its resources in forming a brain that is adapted not only by details in the gene-pattern but also by details in the environment. The environment acts as the dictionary, while the hunting wasp, as it attacks its prey, is guided in detail by its genetic inheritance, the kitten is taught how to catch mice by the mice themselves. Thus, in the learning organism the information that comes to it by the gene-pattern is much supplemented by information supplied by the environment; so, the total adaptation possible, after learning, can exceed the quantity transmitted directly through the gene-pattern.
Ashby further notes:
As a channel of communication, it has a definite, finite capacity, Q say. If this capacity is used directly, then, by the law of requisite variety, the amount of regulation that the organism can use as defense against the environment cannot exceed Q. To this limit, the non-learning organisms must conform. If, however, the regulation is done indirectly, then the quantity Q, used appropriately, may enable the organism to achieve, against its environment, an amount of regulation much greater than Q. Thus, the learning organisms are no longer restricted by the limit.
In the same way the gene-pattern, when it determines the growth of a learning animal, expends part of its resources in forming a brain that is adapted not only by details in the gene-pattern but also by details in the environment. The environment acts as the dictionary, while the hunting wasp, as it attacks its prey, is guided in detail by its genetic inheritance, the kitten is taught how to catch mice by the mice themselves. Thus, in the learning organism the information that comes to it by the gene-pattern is much supplemented by information supplied by the environment; so the total adaptation possible, after learning, can exceed the quantity transmitted directly through the gene-pattern.
As I look at Ashby’s ideas, I cannot help but see similarities between the D1/D2 constraints and Direct/Indirect regulation respectively. Indirect regulation, similar to enabling constraints, helps the organism adapt to its environment by connecting things together. Indirect regulation has a second order nature to it such as learning how to learn. It works on being open to possibilities when interacting with the environment. It brings novelty into the situation. Similar to governing constraints, direct regulation focuses only on the accuracy of the ‘message’. Nothing additional or any form of amplification is not possible. Direct regulation is hardwired, whereas indirect regulation is enabling. Direct regulation is context-free, whereas indirect regulation is context-sensitive. What the hunting wasp does is entirely reliant on its gene pattern, no matter the situation, whereas, what a kitten does is entirely dependent on the context of the situation.
Final Words:
Cybernetics can be looked at as the study of possibilities, especially why out of all the possibilities only certain outcomes occur. There are strong undercurrents to information theory in Cybernetics. For example, in information theory entropy is a measure of how many messages might have been sent, but were not. In other words, if there are a lot of possible messages available, and only one message is selected, then it eliminates a lot of uncertainty. Therefore, this represents a high information scenario. Indirect regulation allows us to look at the different possibilities and adapt as needed. Additionally, indirect regulation allows retaining the successes and failures and the lessons learned from them.
I will finish with a great lesson from Ashby to explain the idea of the indirect regulation:
If a child wanted to discover the meanings of English words, and his father had only ten minutes available for instruction, the father would have two possible modes of action. One is to use the ten minutes in telling the child the meanings of as many words as can be described in that time. Clearly there is a limit to the number of words that can be so explained. This is the direct method. The indirect method is for the father to spend the ten minutes showing the child how to use a dictionary. At the end of the ten minutes the child is, in one sense, no better off; for not a single word has been added to his vocabulary. Nevertheless, the second method has a fundamental advantage; for in the future the number of words that the child can understand is no longer bounded by the limit imposed by the ten minutes. The reason is that if the information about meanings has to come through the father directly, it is limited to ten-minutes’ worth; in the indirect method the information comes partly through the father and partly through another channel (the dictionary) that the father’s ten-minute act has made available.
Please maintain social distance, wear masks and take vaccination, if able. Stay safe and always keep on learning…
In today’s post, I am following up from my last post and looking further at the idea of constraints as proposed by Dr. Lila Gatlin. Gatlin was an American biophysicist, who used the idea of information theory to propose an information-processing aspect of life. In information theory, the ‘constraints’ are the ‘redundancies’ utilized for the transmission of the message. Gatlin’s use of this idea from an evolutionary standpoint is quite remarkable. I will explain the idea of redundancies in language using an example I have used before here. This is the famous idea that if a monkey had infinite time on its hands and a typewriter, it will at some point, type out the entire works of Shakespeare, just by randomly clicking on the typewriter keys. It is obviously highly unlikely that a monkey can actually do this. In fact, this was investigated further by William R. Bennett, Jr., a Yale professor of Engineering. As Jeremy Campbell, in his wonderful book, Grammatical Man, notes:
Bennett… using computers, has calculated that if a trillion monkeys were to type ten keys a second at random, it would take more thana trillion times as long as the universe has been in existence merely to produce the sentence “To be, or not to be: that is the question.”
This is mainly because the keyboard of a typewriter does not truly reflect the alphabet as they are used in English. The typewriter keyboard has only one key for each letter. This means that every letter has the same chance of being struck. From an information theory standpoint, this represents a maximum entropy scenario. Any letter can come next since they all have the same probability of being struck. In English, however, the distribution of letters is not the same. Some letters such as “E” are more likely to occur than say “Q”. This is a form of “redundancy” in language. Here redundancy refers to regularities, something that occurs on a regular basis. Gatlin referred to this redundancy as “D1”, which she described as divergence from equiprobability. Bennett used this redundancy next in his experiment. This will be like saying that some letters now had lot more keys on the typewriter so that they are more likely to be clicked. Campbell continues:
Bennett has shown that by applying certain quite simple rules of probability, so that the typewriter keys were not struck completely at random, imaginary monkeys could, in a matter of minutes, turn out passages which contain striking resemblances to lines from Shakespeare’s plays. He supplied his computers with the twenty-six letters of the alphabet, a space and an apostrophe. Then, using Act Three of Hamlet as his statistical model, Bennett wrote a program arranging for certain letters to appear more frequently than others, on the average, just as they do in the play, where the four most common letters are e, o, t, and a, and the four least common letters are j, n, q, and z. Given these instructions, the computer monkeys still wrote gibberish, but no it had a slight hint of structure.
The next type of redundancy in English is the divergence from independence. In English, we know that certain letters are more likely to come together. For example, “ing” or “qu” or “ion”. If we see an “i” and “o”, then there is high chance that the next letter is going to be an “n”. If we see a “q”, we can be fairly sure that the next letter is going to be a “u”. The occurrence of one letter makes the occurrence of another letter highly likely. In other words, this type of redundancy makes the letter interdependent rather than independent. Gatlin referred to this as “D2”. Bennett utilized this redundancy for his experiment:
Next, Bennett programmed in some statistical rules about which letters are likely to appear at the beginning and end of words, and which pairs of letters, such as th, he, qu, and ex, are used most often. This improved the monkey’s copy somewhat, although it still fell short of the Bard’s standards. At this second stage of programming, a large number of indelicate words and expletives appeared, leading Bennett to suspect that one-syllable obscenities are among the most probable sequences of letters used in normal language. Swearing has a low information content! When Bennett then programmed the computer to take into account triplets of letters, in which the probability of one letter is affected by the two letters which come before it, half the words were correct English ones and the proportion of obscenities increased. At a fourth level of programming, where groups of four letters were considered, only 10 percent of the words produced were gibberish and one sentence, the fruit of an all-night computer run, bore a certain ghostly resemblance to Hamlet’s soliloquy:
TO DEA NOW NAT TO BE WILL AND THEM BE DOES
DOESORNS CALAWROUTOULD
We can see that as Bennett’s experiment started using more and more redundancies found in English, a certain structure seems to emerge. With the use of redundancies, even though it might appear that the monkeys were free to choose any key, the program made it such that certain events were more likely to happen than others. This is the basic premise of constraints. Constraints make certain things more likely to happen than others. This is different than a cause-and-effect phenomenon like a billiard ball hitting another billiard ball. Gatlin’s brilliance was to use this analogy with evolution. She pondered why some species were able to evolve to be more complex than others. She concluded that this has to do with the two types of redundancies, D1 and D2. She considered the transmission of genetic material to be similar to how a message is transmitted from the source to the receiver. She determined that some species were able to evolve differently because they were able to use the two redundancies in an optimal fashion.
If we come back to the analogy with the language, and if we were to only use D1 redundancy, then we would have a very high success rate of repeating certain letters again and again. Eventually, the strings we would generate would become monotonous, without any variety. It would be something like EEEAAEEEAAAEEEO. Novelty is introduced when we utilize the second type of redundancy, D2. Using D2 introduces a more likelihood of emergence since there are more connections present. As Campbell explains the two redundancies further:
Both kinds lower the entropy, but not in the same way, and the distinction is a critical one. The first kind of redundancy, which she calls D1, is the statistical rule that some letters likely to appear more often than the others, on the average, in a passage of text. D1 which is context-free, measures the extent to which a sequence of symbols generated by a message source departs from the completely random state where each symbol is just as likely to appear as any other symbol. The second kind of redundancy, D2, which is context-sensitive, measures the extent to which the individual symbols have departed from a state of perfect independence from one another, departed from a state in which context does not exist. These two types of redundancy apply as much to a sequence of chemical bases strung out along a molecule of DNA as to the letters and words of a language.
Campbell suggests that D2 is a richer version of redundancy because it permits greater variety, while at the same time controlling errors. Campbell also notes that Bennett had to utilize the D1 constraint as a constant, whereas he had to keep on increasing the D2 constraints to the limit of his equipment until he saw something roughly similar to sensible English. Using this analogy to evolution, Gatlin notes:
Let us assume that the first DNA molecules assembled in the primordial soup were random sequences, that is, D2 was zero, and possibly also D1. One of the primary requisites of a living system is that it reproduces itself accurately. If this reproduction is highly inaccurate, the system has not survived. Therefore, any device for increasing the fidelity of information processing would be extremely valuable in the emergence of living forms, particularly higher forms… Lower organisms first attempted to increase the fidelity of the genetic message by increasing redundancy primarily by increasing D1, the divergence from equiprobability of the symbols. This is a very unsuccessful and naive technique because as D1 increases, the potential message variety, the number of different words that can be formed per unit message length, declines. Gatlin determined that this was the reason why invertebrates remained “lower organisms”.
A much more sophisticated technique for increasing the accuracy of the genetic message without paying such a high price for it was first achieved by vertebrates. First, they fixed D1. This is a fundamental prerequisite to the formulation of any language, particularly more complex languages… The vertebrates were the first living organisms to achieve the stabilization of D1, thus laying the foundation for the formulation of a genetic language. Then they increased D2 at relatively constant D1. Hence, they increased the reliability of the genetic message-without loss of potential message variety. They achieved a reduction in error probability without paying too great a price for it… It is possible’ within limits to increase the fidelity of the genetic message without loss of potential message variety provided that the entropy variables change in just the right way, namely, by increasing D2 at relatively constant D1. This is what the vertebrates have done. This is why we are “higher” organisms.
Final Words:
I have always wondered about the exponential advancement of technology and how we as a species were able to achieve it. Gatlin’s ideas made me wonder if they are applicable to our species’ tremendous technological advancement. We started off with stone tools and now we are on the brink of visiting Mars. It is quite likely that we first came across a sharp stone and cut ourselves on it and then thought of using it for cutting things. From there, we realized that we could sharpen certain stones to get the same result. Gatlin puts forth that during the initial stages, it is extremely important that errors are kept to a minimum. We had to first get better at the stone tools before we could proceed to higher and more complex tools. The complexification happened when we were able to make connections – by increasing D2 redundancy. As Gatlin states – D2 endows the structure, The more tools and ideas we could connect, the faster and better we could invent new technologies. The exponentiality only came by when we were able to connect more things to each other.
I was introduced to Gatlin’s ideas through Campbell and Alicia Juarrero. As far as I could tell, Gatlin did not use the terms “context-free” or “context-sensitive”. They seem to have been used by Campbell. Juarrero refers to “context-free constraints” as “governing constraints” and “context-sensitive constraints” as “enabling constraints”. I will be writing about these in a future post. I will finish with a neat observation about the ever-present redundancies in English language from Claude Shannon, the father of Information Theory.:
The redundancy of ordinary English, not considering statistical structure over greater distances than about eight letters, is roughly 50%. This means that when we write English half of what we write is determined by the structure of the language and half is chosen freely.
In other words, if you follow basic rules of English language, you could make sense at least 50% of what you have written, as long as you use short words!
Please maintain social distance, wear masks and take vaccination, if able. Stay safe and always keep on learning… In case you missed it, my last post was More Notes on Constraints in Cybernetics: