The ‘day of the week boy or girl’ paradox explained

There is a well-known result in probability theory that seems to defy common sense. Note that in the problem statement we assume that the probability of a random child being a boy is exactly the same as being a girl (which is not completely true, as there are slightly more boys than girls, but taking that into account would only complicate the problem numerically and would not change the logic in any way).

Let’s assume that someone tells us that a certain person, let’s call him Peter, has exactly two children, but we are not told anything about them, apart from one of the following pieces of information:

Peter has a son.
(That is, at least one of Peter’s children is a boy.)
Peter has a son born on a Tuesday.
(That is, at least one of Peter’s children is a boy born on a Tuesday.)

Now we are asked the following question, for Case A and Case B separately:

What is the probability that Peter has a daughter?
(That is, what is the probability that one of Peter’s children is a girl?)

Note: Another version of the problem asks: “What is the probability that Peter has two sons?” This is equivalent to our problem; the resulting probability is just one minus the answer to the question above.

I think it is in human nature to say that the answer is 50% in both cases. Counterintuitive as it may seem, if we assume the interpretation below, neither of the answers is 50%, and the answers are not the same.

Before I explain why the probabilities are different, I should clarify what is meant by “probability”—Peter already has those children, after all, so their sexes have been already determined; hence, strictly speaking, it does not make sense to speak about the “probability” that Peter has a daughter.

There is a controversy regarding the correct interpretation of what is meant by “probability” in the statement of the problem. The interpretation intended by the author of the problem when it was first published is as follows (where “son” is understood to mean “son born on a Tuesday” in Case B):

Suppose we randomly choose a person with two children and ask them the question: “Do you have a son?” The randomly chosen person was Peter, and he answered “yes”.

Since everyone would answer either “yes” or “no” to the question “Do you have a son?”, there are two groups of people—those who have a son and those who don’t. Since we know that Peter answered “yes”, the interpretation above is equivalent to the following one:

What is the probability that a randomly selected person from the group of all people who have two children, at least one of whom is a boy, has a daughter?

However, there is also another interpretation, which does lead to the answer 50%. I will examine that interpretation at the end of this article and explain why I believe the interpretation stated above is more natural (although it gives counterintuitive results).

Case A: Son born on any day of the week

Let’s deal with Scenario A first because it is the easier one. We know that Peter has at least one son, so there are 3 different possibilities: 1) Both children are boys. 2) The older child is a boy and the younger child is a girl. 3) The older child is a girl and the younger child is a boy. Symbolically, Peter’s children could have been born in any of the following orders:

$$ BB\\BG\\GB $$

The order $GG$ is not possible because Peter has at least one boy. Since a boy being born has the same probability as a girl being born (both as the first child and as the second child), all three variants have the same probability and cover 100% of all possible cases, so each has probability $÷{1}{3} \approx 33.3\%$. There is a girl in 2 out of the 3 cases, so the probability that Peter has a daughter is $÷{2}{3} \approx 66.7\%$.

Readers familiar with probability theory can derive the same result easily using conditional probability ($B$ denotes “has a son” and $G$ denotes “has a daughter”):

$$ ℙ(G|B) = ÷{ℙ(G ∩ B)}{ℙ(B)} = ÷{ℙ(GB\,\text{or}\,BG)}{ℙ(BB\,\text{or}\,BG\,\text{or}\,GB)} = ÷{2/4}{3/4} = ÷{2}{3} $$

Yet another way to look at this is to look at the whole population of people with 2 children. It consists of 4 groups of (in real life only approximately) the same size: those who have children $BB$, $BG$, $GB$, and $GG$. “Peter has a son” is equivalent to “Peter does not belong to the $GG$ group”, which leaves us with a population consisting of three groups ($BB$, $BG$, and $GB$) of equal sizes. A randomly selected person from this population would have a daughter 2/3 of the time.

Note that in the real world, the probability would likely not be exactly 2/3, since the 4 groups would not have exactly the same size, but it would be very close.

Case B: Son born on Tuesday

This is where things start to get counterintuitive. Just like in the previous case, we can write down all equally likely possibilities. Let’s denote days of the week by numbers $1, 2, \dots, 7$, with $1$ being Monday, $2$ being Tuesday, and so on. Now we can denote the event “a boy was born on day $n$” as $B_n$, and similarly for $G_n$. For example, $B_3$ means a boy born on a Wednesday, $G_1$ means that a girl was born on a Monday. Using this notation, we can write events like $B_3G_1$, which means “the first child was a boy born on a Wednesday and the second child was a girl born on a Monday”.

Let’s assume that it is equally likely for a child to be born on any day of the week (which, just like with the child’s sex, is not completely true, but it is a reasonable assumption to keep the problem simple). This leads to the following $27$ equally likely ways Peter’s children could have been born (each of them must contain $B_2$, representing a boy born on Tuesday):

$$ \begin{matrix} B_2G_1 & G_1B_2 & B_1B_2 & B_2B_1 \\ B_2G_2 & G_2B_2 & B_2B_2 & \\ B_2G_3 & G_3B_2 & B_3B_2 & B_2B_3\\ B_2G_4 & G_4B_2 & B_4B_2 & B_2B_4\\ B_2G_5 & G_5B_2 & B_5B_2 & B_2B_5\\ B_2G_6 & G_4B_2 & B_6B_2 & B_2B_6\\ B_2G_7 & G_7B_2 & B_7B_2 & B_2B_7 \end{matrix} $$

Wait a minute… Where did the $B_2B_2$ in the last column go? Well, $B_2B_2$ is already there, in the third column. It describes the situation that Peter has two sons, both born on a Tuesday. The table above is a complete list of all the ways two children can be born when at least one of them is a boy born on a Tuesday.

Of course, under our assumptions, all these events are equally likely. There are $27$ possibilities, out of which $14$ (those in the first two columns) include a girl. The probability that Peter has a daughter is $÷{14}{27} \approx 51.9\%$.

Again, this can be reformulated in terms of the population of all people who have two children, which consists of $4⋅7⋅7=196$ groups like $B_1G_1$, $B_1G_2$, $B_2G_1$, etc., of approximately the same size. By removing the groups that do not satisfy the condition of having a boy born on a Tuesday, we are left with the groups listed above. Since they are of approximately equal sizes, the ratio of the number of people who have a daughter to the total number of people in the group would be approximately $÷{14}{27}$.

Explanation of the results

I know what you are thinking. The first result with $66.7\%$ already looks iffy when you first see the problem, and the idea that the additional bit of information that the son was born on a Tuesday changes the outcome sounds completely ridiculous.

However, it is important to realize, as I have stated at the beginning of the article, that the information we got tells us that Peter is a random member of a certain special group of people with two children who have a son.

Most people would agree that this is the natural way to interpret probability, but it is actually not what most people imagine when they try to solve this particular problem. They imagine that there is a guy named Peter who has a son, let’s say his name is Adam, and we ask about the probability that the other child is a girl. Of course, this probability is $50\%$, since “the other child” is just a random child.

What’s the problem? People think that we speak about one particular son of Peter’s, and the question is about the other child. However, Peter is a random person with two children, at least one of whom is a boy. If Peter happens to have two sons, it does not make sense to talk about his “other child”, i.e. “the child other than his son”, and this is where the “common sense” reasoning stops working—and where the other interpretation comes into play.

The question of interpretation

As I mentioned at the beginning of the article, there is a controversy over the correct interpretation of the problem. Another way to interpret it is as follows:

Suppose that we randomly choose a person with two children. The piece of information we are given is a true statement about the sex of a randomly chosen child of this person.

That is, this interpretation assumes that the person who gives us the information picked one of the children at random with probability 50% (or, for example, randomly saw one of Peter’s children when visiting him) and told us what its gender was. For example, if Peter happened to have two daughters, the piece of information we would have received would have necessarily been that he has a daughter.

Now, it is quite clear, assuming this interpretation, that the answer to the original question should be 50%; the fact that a random child of Peter is a boy does not influence the chance of the other child being a girl in any way. To formalize this interpretation, it is necessary to assume that not only are the genders of Peter’s children random, but also the the information we receive (that is, whether we are told “Peter has a son” or “Peter has a daughter”) is based on a random event.

If $T_B$ denotes the event “we are told that a randomly chosen child of Peter is a boy”, i.e. $ℙ(T_B|BB) = 1$, $ℙ(T_B|BG) = ℙ(T_B|GB) = ÷{1}{2}$, and $ℙ(T_B|GG) = 0$, we get:

$$ ℙ(G|T_B) = ÷{ℙ(G ∩ T_B)}{T_B} = ÷{ℙ((BG∪GB∪GG)∩T_B)}{ℙ(T_B ∩ BB)+…+ℙ(T_B ∩ GG)} = ÷{ℙ(T_B|BG)ℙ(BG)+ℙ(T_B|GB)ℙ(BG)+ℙ(T_B|GG)ℙ(GG)}{ℙ(T_B|BB)ℙ(BB)+…+ℙ(T_B ∩ GG)ℙ(GG)} = ÷{÷{1}{2}⋅÷{1}{4}+÷{1}{2}⋅÷{1}{4}+0⋅÷{1}{4}}{1⋅÷{1}{4}+÷{1}{2}⋅÷{1}{4}+÷{1}{2}⋅÷{1}{4}+0⋅÷{1}{4}} = ÷{1}{2} $$

I personally find this interpretation rather odd because it assumes the existence of an ad hoc process that randomly generates a piece of information about the subject. This is not the usual way to interpret information. We expect only the subject itself to be random, but not which piece of information we get about the subject to be random as well, unless this further randomness is specifically pointed out in the problem statement.

For example, if someone tells you: “Peter is Russian. What is the probability that …”, you would not even think of interpretations like “the fact we are told was chosen randomly from the data on Peter’s ID”; you would just look up statistical data about Russians.

Nevertheless, neither interpretation is “right” or “wrong”. The statement of the problem is ambiguous (unless an interpretation is specified), and different people may understand it in different ways. It all ultimately depends on whether the possible pieces of information we could receive are “Peter has a son” and “Peter does not have a son” or “Peter has a son” and “Peter has a daughter”.

Case A: Son born on any day of the week

Case B: Son born on Tuesday

Explanation of the results

The question of interpretation

Use the image