I'm looking over Casella and Berger's Statistical Inference and reviewing some of the concepts. For the record, this book is exactly what you want if you need to take a statistics qualifying exam as a graduate student and not at all what a generalist will want. I actually enjoyed the material for the relatively clean overview it gave. There's depth, but not so much as to deter someone without a degree in mathematics. That said, I would not recommend this book for beginners. If you do want to slog through, get a supplementary book.

I'm currently looking at a section on page 4 of the second edition. There is a section that begins:

The operations of union and intersection can be extended to infinite collections of sets as well If is a collection of sets, all defined on a sample space , then

For example, let and define . Then

(the point 1)

The above occurred after a discussion about countable and uncountable sets and proving theorems about sets from first principles (rather than Venn diagrams). If your eyes kind of glazed over while reading the above, no worries -- mine did, too. Actually, my reaction was worse than glazing over: I skimmed and thought I understood. However, as I started to tease apart what was written, I realized that there was much more going on here than I realized.

First of all, the preceding section describes reasoning about events that can be represented as finite sets, over finite sample spaces. This section builds on what we know to discuss the infinite case.

In our example, the sample space is the interval , which is defined over the reals. We are defining a *countably infinite* set of intervals, each denoted by some . How do we know it's countably infinite? This is implied by the notation: we start at 1 and go to infinity. Therefore, there is a 1-1 correspondence with the natural numbers and thus the set of intervals is infinite. Let's take a look at some of the intervals (note that I can't draw lines, so imagine that the dashed line is actually connecting the endpoints and they're actually aligned):

Now, the first statement says that the union of infinite subsets of a sample space should be equal to the sample space: . The statement on the far left of the set notation, , gives us 's domain: it is defined over , which we have defined to be . The right side of the "such that" states that every x in the domain that is in some partition is in this set. Recall that was defined as .

The above statement seems self-evident for rational numbers: for any rational number , we know that there is a coarse-grained bound such that is less than or equal to and therefore is in . But what about irrational numbers? For this to work, we would need a theorem that gives rational bounds on irrational numbers. This seems like something that ought to be out there, but I'm not sure where to look. I suspect hard-core PL theory, such as work on PCF, would have something to say about this. Number theory and/or real analysis would also be good candidates.

In any case, we are trying to make the argument that for every number on the real interval , there exists at least one sub-interval with rational endpoints, and that the union of these sub-intervals gives us back the interval exactly. We don't miss any numbers on the interval . We might make some argument about the compactness of the interval.

The second statement defines intersection in this context. Here we would make an argument about the uniqueness of the intervals. That is, if , then . Every sub-interval is unique by virtue of its lower bound. However, the upper bound (1) is included in every interval. Therefore, the only interval all sub-intervals could share is the point 1.

It seems the point of the example was to show that countably infinite sets do occur and that the principles of set theory still hold. Defining the sample space as the reals appeals to practical considerations not fully explored in the text: we often model phenomena we measure as having infinite precision, but we know that our instruments can only be finitely precise. A countably infinite partition seems like it would introduce less error into a calculation than a finite one, i.e., a histogram.