On the Monty Hall Problem:

The Monty Hall problem has to be one of the most fascinating probability problems. The problem was first posed to Marilyn vos Savant in her column, “Ask Marilyn,” in Parade magazine:

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?

Her response was that the player should switch. This caused an uproar among her readers. Several readers, including PhDs in Mathematics, wrote back to her saying that she was absolutely wrong. One response read:

“You are utterly incorrect about the game-show question, and I hope this controversy will bring some public attention to the serious national crisis in mathematical education. If you can admit your error, you will have contributed constructively toward the solution of a deplorable situation. How many irate mathematicians are needed to get you to change your mind?”

Another response said:

“You’re wrong, but look at the positive side. If all those Ph.D.’s were wrong, the country would be in very serious trouble.”

The intuition here is to focus on the two remaining choices and then assume that they are equally likely, therefore saying that the probability is 50% or ½. However, this is incorrect.

Let’s look at this in another way. If you have three doors (A, B, C), and there is a car behind one of the doors, the probability that you would choose that door at random is 1/3. Let’s say that you chose Door A. p(A) = 1/3

The probability that the car is behind one of the other two doors is (1/3 + 1/3) = 2/3. This can also be viewed as the car being behind door B or door C.

p(B) + p(C) = 2/3

Now, the host knows which door has the car. Therefore, the host can open the door without the car. Let’s say he opens door B and shows that the door does not contain a car (goat door). This means that once the host opens door B and shows it is empty, p(B) = 0. Therefore, p(C) is now 2/3. Thus, it would be logical for you to switch so that you can increase your probability from 1/3 to 2/3.

Here is another example to explain this. Let’s say that a “bad” magician shuffles a deck of playing cards, spreads the cards out, and asks you to pick the Ace of Spades from the spread-out cards. The cards are all facing down. You then focus on the cards and choose one at random, placing it in your pocket without looking. The probability that you chose the Ace of Spades is 1/52 (assuming there are 52 cards). The probability that the Ace of Spades is in the remainder of the deck is 51/52. Now the magician slowly turns over each card and shows that it is not the Ace of Spades. The magician is using a marked deck, so he knows the card by looking at the back. Finally, one card remains face down. Should you switch?

Of course, you should. The probability that the remaining card is the Ace of Spades is 51/52. Note that I started this problem by saying it is a “bad” magician. If it is a good magician, you should stick to your original choice.

I have created a simulator program that the reader is welcome to play around with. This is an executable file and was coded in Python. Please do verify that there is no virus. I ran 10 billion simulations, and the end result was that the player won 0.6667 times when they switched. This aligns with the theoretical probability.

The ‘Monty’ from the problem is based on the TV host Monty Hall, who was the host for the game show, Let’s Make a Deal. He never did offer the player to switch the door. This formulation was from vos Savant’s reader. A version of this problem was originally posted by Martin Gardner, Three Prisoners Problem.

The Monty Hall problem can be generalized for N doors, where Monty opens M doors. The probability of winning by switching is given by the formula:

p(win by switching) = (N-1)/(N* (N-M-1))

Where:

N = total number of doors

M = number of doors Monty Opens

In the classic problem, we have 3 doors in total, and Monty opens 1 door.

p(win by switching) = (3-1)/(3*(3-1-1)) = 2/3

All probabilities are conditional probabilities:

Now let’s get back to the classic problem and say that Monty does not know which door has the car. This means that Monty is going to randomly open one of two doors. And further, let’s say that the door Monty opens does not contain a car. In the scenario, should the player switch?

In the scenario, the player is not going to gain by switching the door since the probability is a 1/2. What gives? Let’s look at this further:

  1. Initial Setup: As in the classic problem, the player has a 1/3 chance of initially picking the car and a 2/3 chance of picking a goat.
  2. Host’s Random Choice: Unlike the classic problem, the host doesn’t know what’s behind the doors. This is the information that is critical here.
  3. Possible Outcomes:
    • If the player picked the car (1/3 chance), the host will always reveal a goat.
    • If the player picked a goat (2/3 chance), there’s a 50% chance the host reveals the car (ending the game) and a 50% chance he reveals the other goat.
  4. Conditional Probability: We’re only considering the scenario where the game continues (i.e., a goat was revealed). This happens in two ways:
    • The player picked the car (1/3 chance) and the host revealed a goat (100% chance given the player’s choice)
    • The player picked a goat (2/3 chance) and the host revealed the other goat (50% chance given the player’s choice)
  5. Probability Calculation:
    • P(car behind player’s door | goat revealed) = (1/3) / (1/3 + 1/3) = 1/2
    • P(car behind other closed door | goat revealed) = (1/3) / (1/3 + 1/3) = 1/2

The key difference from the classic problem is that the host’s lack of knowledge introduces a possibility of the game ending early (if they reveal the car). This changes the conditional probabilities when we know the game has continued.

In essence, the host’s random choice acts as a filter that equalizes the probabilities. If a goat is revealed, it’s equally likely that it happened because the player initially chose the car or because the player chose a goat and got lucky with the host’s random choice.

This scenario demonstrates how crucial the host’s knowledge and behavior are to the probabilities in the Monty Hall problem. This leads to the following core ideas of Bayesian statistics:

  • All probabilities are conditional probabilities. It is always in the form of p(an event | the information we have on hand).
  • In light of new information, we should update our prior probability.

Always keep on learning…

The Magical “All Possibilities”:

When you have eliminated all which is impossible, then whatever remains, however improbable, must be the truth. – Holmes

Imagine that you have a coin in your hand, and you are throwing it up in the air. How would you assign probabilities for the outcome? Generally, we are taught that a coin flip has a 50% chance of tails and 50% chance of heads, assuming that we are using a fair coin. The reasoning is that there are only two possible outcomes (heads, tails). Therefore, the probability of either one happening is 50%.

I have written about Bayesian epistemology before. If we evaluate the coin flip example, there is more going on here than meets the eye. The basis of all this is – from whose perspective? In Bayesian epistemology, probability is not a feature of the phenomenon such as the coin flip. The coin is not aware of the probabilities with which it should fall. The probabilities that we assign is a feature of our uncertainty, and it has nothing to do with the coin. In the example, only two outcomes were considered. Depending on the observer, this could be expanded. For example, we can consider the coin falling on its edge. Or perhaps, the coin may not land at all if we can imagine a bird catching it in midair and swallowing it, or it could be that the coin is being thrown in space. Based on our experience, we may conclude that the last two scenarios are unlikely. But the key points here are:

  1. Every description requires a describer. Every observation requires and observer. In science and in general language, we ignore the describer/observer. We engage in conversation or studies as if, we have access to objectivity. The science we have is a human science in the sense that it is a version that we have generated based on what our human interpretative framework affords.
  2. We need to be aware of how we made our observation, and be open to modifying it. Whatever we say or do if based on the current state of our knowledge/belief system. This needs to be updated based on the feedback from the environment.
  3. Any attempt at an experiment or study is to reduce our uncertainty about something. Going back to Bayesian epistemology, any expression in probability is an expression of our uncertainty. The phenomenon that we are studying are not following any rules. They do not have a mind of their own. We are projecting our “certainties” as rules onto them. A great example is the often-quoted scenario of birds flocking together to explain complexity. The birds do not know these rules. They exhibit a behavior that got reinforced through natural selection. The rules are our merely a projection of what we think is going on. In other words, the complexity of the flight of birds coming from the simple rules is just our construction.

The idea of “all the possibilities” is made quite clear in the Arthur Conan Doyle quote at the start of this post. This quote is often touted in TV shows and movies alike. However, the quote represents a fallacious idea, the root of which stems from an incorrect assumption. The assumption here is that one can eliminate ALL which is impossible. Similar to the coin toss example, this depends on the observer and their ability to know ALL that can happen, which requires omniscience. Additionally, one has to disprove every one of those possible outcomes. Only after this can one truly look at whatever remains. Aptly, this fallacy is termed as “Holmesian Fallacy”. We simply do not have access to ALL possibilities.

In Cybernetics, a key idea that is relevant here is variety. Variety is the number of possible states. This was put forward by one of the pioneers in Cybernetics, Ross Ashby. For example, we could say that a coin has a variety of 2 – heads or tails. Or we could say that a coin has a variety of 3 – heads, tails or its edge. As we can see the variety is dependent upon the observer. Being aware of this dependency is part of second order cybernetics. If we could restate the definition of variety in second order cybernetics, it would be – variety is the number of possible states as perceived by an observer. Variety is tightly linked to the concept of entropy.

Ashby noted that the initial variety that we have perceived will tend to decay over time if nothing changes. A great example that Ashby gives is the example of a wife visiting a prisoner. Let’s say that the wife wishes to convey a message to the prisoner using a cup of coffee that she can send to him. The warden is smart and he foretells the wife that he will add cream and sweetener to the coffee, and will also remove the spoon from the coffee. In addition, the coffee will always be filled to the brim. The warden has removed a lot of variety from the cup of coffee. The wife realizes now that the available variety that she has is to do with how hot the coffee is. She perceives the variety as 3 – HOT, TEPID or COLD. However, the warden is able to block this with time. If the warden is able to delay giving the coffee to the prisoner, then this variety is also lost. As Ashby put it, as time progresses the variety in the set cannot increase and will usually diminish.

On a similar note, Ashby also spoke of the law of experience. He noted that when we impose a change in a ‘system’, we tend to reduce its knowledge of its initial state or variety. The example he gave is that of a group of boys who have been to the same school – it is found that a number of boys of marked individuality, having all been through the same school, develop ways that are more characteristic of the school they attended than of their original individualities.

If we are including the idea of observer here, we see the “system” as the “system” that also includes the observer. This brings in a self-referential nature to this. If nothing changes, then our useful information regarding a phenomenon will either stay the same or decay over time. The useful variety that we have perceived will remain a constant or will decay over time. In addition, as the observer, we ourselves tend to fall along a line or conform to whichever tribe or community we belong to. We lose our original variety with time. The first step in overcoming these is to be aware. Be aware of our blindness; be aware of our limitations and biases; be aware of our shortcomings. We have to be aware that we do not have knowledge of “ALL possibilities”. We have to be open to challenging our worldviews. We have to evaluate and error-correct our beliefs on a regular basis. We do not perform error-correction on a continuous basis, but on a discontinuous basis.

I will finish with an anecdote on the apparent randomness of quantum mechanics that prompted Einstein to say that God does not play dice. As noted Italian physicist Carlo Rovelli wrote:

When Einstein objected to quantum mechanics by remarking that “God does not play dice,” Bohr responded by admonishing him, “Stop telling God what to do.” Which means: Nature is richer than our metaphysical prejudices. It has more imagination than we do.

Einstein was worried about the uncertainties he faced with quantum mechanics and he noted that the metaphorical God does not play dice like that. In a similar way the late Stephen Hawking noted:

So God does play dice with the universe. All the evidence points to him being an inveterate gambler, who throws the dice on every possible occasion… Not only does God definitely play dice, but He sometimes confuses us by throwing them where they can’t be seen. 

Stay safe and always keep on learning… In case you missed it, my last post was The “Mind Projection Fallacy” in Systems Thinking:

The Cybernetics of Bayesian Epistemology:

I have had some good conversations recently about epistemology. Today’s post is influenced by those conversations. In today’s post, I am looking at Bayesian epistemology, something that I am very influenced by. As the readers of my blog may know, I am a student of Cybernetics. One of the main starting points in Cybernetics is that we are informationally closed. This means that information cannot enter into us from outside. This may be evident for any teachers in my viewership. You are not able to open up a student’s brain and pour information in as a commodity and then afterwards seal it back up. What happens instead is that the teacher perturbs the student and the student in turn generates meaning out of the perturbation. This would also mean that all knowledge is personal. This is something that was taught by Michael Polanyi.

How we know something is based on what we already know. The obvious question at this juncture is what about the first knowledge? Ross Ashby, one of the pioneers of Cybernetics, has written that there are two main forms of regulations. One is the gene pattern, something that was developed over generations through the evolutionary process. An example of this is the impulse of a baby to grab or to breastfeed without any training. The second is the ability to learn. The ability to learn amplifies the chance of survival of the organism. In our species, this allows us to literally reach for the celestial bodies.

If one accepts that we are informationally closed, then one has to also accept that we do not have direct access to the external reality. What we have access to is what we make sense of from experiencing the external perturbations. Cybernetics aligns with constructivism, the philosophy that we construct a reality from our experience. Heinz von Foerster, one of my favorite Cyberneticians, postulated that our nervous system as a whole is organized in such a way (organizes itself in such a way) that it computes a stable reality. All we have is what we can perceive through our perception framework. The famous philosopher, Immanuel Kant, referred to this as the noumena (the reality that we don’t have direct access to) and the phenomena (the perceived representation of the external reality). We compute a reality based on our interpretive framework. This is just a version of the reality, and each one of us computes such a reality that is unique to each one of us. The stability comes from repeat interactions with the external reality, as well as with interactions with others. We do not exist in isolation from others. The more interactions we have the more we have the chance to “calibrate” it against each other.

With this framework, one does not start from ontology, instead one starts from epistemology. Epistemology deals with the theory of knowledge and ontology deals with being (what is out there). What I can talk about is what I know about rather than what is really out there.

Bayesian epistemology is based on induction. Induction is a process of reasoning where one makes a generalization from a series of observations. For example, if all the swans you have seen so far in your life are white swans, then induction would direct you to generalize that all swans are white. Induction assumes uniformity of nature, to quote the famous Scottish philosopher David Hume. This means that you assume that the future will resemble the past. Hume pointed out that induction is faulty because no matter how many observations one makes, one cannot assume that the future will resemble the past. We seek patterns in the world, and we make generalizations from them. Hume pointed out that we do this out of habit. While many people have tried to solve the problem of induction, nobody has really solved it.

All of this discussion lays the background for Bayesian epistemology. I will not go into the math of Bayesian statistics in this post. I will provide a general explanation instead. Bayesian epistemology puts forth that probability is not a characteristic of a phenomenon, but a statement about our epistemology. The probabilities we assign are not for THE reality but for the constructed reality. It is a statement about OUR uncertainty, and not about the uncertainty associated with the phenomenon itself. The Bayesian approach requires that we start with what we know. We start with stating our prior belief, and based on the evidence presented, we will modify our belief. This is termed as the “posterior” in Bayesian terms. Today’s posterior becomes tomorrow’s prior because “what we know now” is the posterior.

Another important thing to keep in mind is that one does not assign a 0 or 100% for your belief. Even if you see a coin with 10,000 heads in a row, you should not assume that the coin is double headed. This would be jumping into the pit of the problem of induction. We can keep updating our prior based on evidence without reaching 100%.

I will write more on this topic. I wanted to start off with an introductory post and follow up with additional discussions. I will finish with some appealing points of Bayesian epistemology.

Bayesian epistemology is self-correcting – Bayesian statistics has the tendency to cut down your overconfidence or underconfidence. The new evidence presented over several iterations corrects your over or under reach into confidence.

Bayesian epistemology is observer dependent and context sensitive – As noted above, probability in Bayesian epistemology is a statement of the observer’s belief. The framework is entirely dependent on the observer and the context around sensemaking. You do not remove the observer out of the observation. In this regard, Bayesian framework is hermeneutical. We bring our biases to the equation, and we put money where our mouth is by assigning a probability value to it.

Circularity – There is an aspect of circularity in Bayesian framework in that today’s prior becomes tomorrow’s posterior as noted before.

Second Order Nature – The Bayesian framework requires that you be open to changing your beliefs. It requires you to challenge your assumptions and remain open to correcting your belief system. There is an aspect of error correction in this. You realize that you have cognitive blind spots. Knowing this allows us to better our sensemaking ability. We try to be “less wrong” than “more right”.

Conditionality – The Bayesian framework utilizes conditional probability. You see that phenomena or events do not exist in isolation. They are connected to each other and therefore require us to look at the holistic viewpoint.

Coherence not Correspondence – The use of priors forces us to use what we know. To use Willard Van Orman Quine’s phrase, we have a “web of belief”. Our priors must make sense with all the other beliefs we already have in place. This supports the coherence theory of truth instead of the realist’s favorite correspondence theory of truth. I welcome the reader to pursue this with this post.

Consistency not completeness – The idea of a consistency over completeness is quite fascinating. This is mainly due to the limitation of our nervous system to have a true representation of the reality. There is a common belief that we live with uncertainty, but our nervous system strives to provide us a stable version of reality, one that is devoid of uncertainties. This is a fascinating idea. We are able to think about this only from a second order standpoint. We are able to ponder about our cognitive blind spots because we are able to do second order cybernetics. We are able to think about thinking. We are able to put ourselves into the observed.

I will finish with an excellent quote from Albert Einstein:

“As far as the laws of mathematics refer to reality, they are not certain; as far as they are certain, they do not refer to reality”.

Please maintain social distance, wear masks and take vaccination, if able. Stay safe and always keep on learning…

In case you missed it, my last post was Error Correction of Error Correction: