What is the use of the cumulative distribution function F X of a continuous random variable?

Previous: Continuous Random Variables

Next: 2.2 – A Simple Example

We previously defined a continuous random variable to be one where the values the random variable are given by a continuum of values. For example, we can define a continuous random variable that can take on any value in the interval [1,2].

To be more precise, we recall the definition of a cumulative distribution function (CDF) for a random variable that was introduced in the previous lesson on discrete random variables.

Definition: The Cumulative Distribution Function

The cumulative distribution function for a random variable X, denoted by F(x), is the probability that X assumes a value less than or equal to x:

The cumulative distribution function has the following properties:

0 ≤ F(x) ≤ 1 for all values of x
F(x) is a nondecreasing function of x

Additionally, for continuous random variables, F(x) is a continuous function.

From this, we can define a continuous random variable to be any random variable X whose CDF is a continuous function. Notice that this is in contrast to the case of discrete random variables where the corresponding CDF is always a discontinuous step-function.

Recall the cumulative distribution function we had for the test scores example in the previous lesson. The cumulative distribution function was graphed at the end of the example.

Observe that

from 0 to 30, F is constant because there are no test scores before 30
from 30 to 60, F is constant because there are no scores between 30 and 60.

The graph of F "increases in steps" at 30, 60, 80, 90, and 100. CDFs for discrete random variables are always step functions. A discrete random variable cannot assume a continuum of values; thus, its CDF can only increase at a finite or countably infinite set of points.

For another example, consider the function whose graph is given below.

This function cannot represent a CDF for a continuous random variable because the function F is not continuous for all values of x. However, F could represent a cumulative distribution function for a discrete random variable since it satisfies our definition from the previous lesson on discrete random variables.

source: http://wiki.ubc.ca/Science:MATH105_Probability/Lesson_2_CRV/2.01_The_Cumulative_Distribution_Function

Previous: Continuous Random Variables

Next: 2.2 – A Simple Example

Video Available

3.2.1 Cumulative Distribution Function

The PMF is one way to describe the distribution of a discrete random variable. As we will see later on, PMF cannot be defined for continuous random variables. The cumulative distribution function (CDF) of a random variable is another method to describe the distribution of random variables. The advantage of the CDF is that it can be defined for any kind of random variable (discrete, continuous, and mixed).

Definition
The cumulative distribution function (CDF) of random variable $X$ is defined as $$F_X(x) = P(X \leq x), \textrm{ for all }x \in \mathbb{R}.$$

Note that the subscript $X$ indicates that this is the CDF of the random variable $X$. Also, note that the CDF is defined for all $x \in \mathbb{R}$. Let us look at an example.

Example

I toss a coin twice. Let $X$ be the number of observed heads. Find the CDF of $X$.

Solution
- Note that here $X \sim Binomial (2, \frac{1}{2})$. The range of $X$ is $R_X=\{0,1,2\}$ and its PMF is given by $$P_X(0)=P(X=0)=\frac{1}{4},$$ $$P_X(1) =P(X=1)=\frac{1}{2},$$ $$P_X(2)=P(X=2)=\frac{1}{4}.$$ To find the CDF, we argue as follows. First, note that if $x < 0$, then $$F_X(x)=P(X \leq x)=0, \textrm{ for } x < 0.$$ Next, if $x\geq 2$, $$F_X(x)=P(X \leq x)=1, \textrm{ for } x\geq 2.$$ Next, if $0 \leq x < 1$, $$F_X(x)=P(X \leq x)=P(X=0)=\frac{1}{4}, \textrm{ for } 0 \leq x < 1.$$ Finally, if $1 \leq x < 2$, $$F_X(x)=P(X \leq x)=P(X=0)+P(X=1)=\frac{1}{4}+\frac{1}{2}=\frac{3}{4}, \textrm{ for } 1 \leq x < 2.$$ Thus, to summarize, we have \begin{equation} \nonumber F_X(x) = \left\{ \begin{array}{l l} 0 & \quad \text{for } x < 0\\ \frac{1}{4} & \quad \text{for } 0 \leq x < 1\\ \frac{3}{4} & \quad \text{for } 1 \leq x < 2 \\ 1 & \quad \text{for } x \geq 2\\ \end{array} \right. \end{equation}
  Note that when you are asked to find the CDF of a random variable, you need to find the function for the entire real line. Also, for discrete random variables, we must be careful when to use "$ < $" or "$\leq$". Figure 3.3 shows the graph of $F_X(x)$. Note that the CDF is flat between the points in $R_X$ and jumps at each value in the range. The size of the jump at each point is equal to the probability at that point. For, example, at point $x=1$, the CDF jumps from $\frac{1}{4}$ to $\frac{3}{4}$. The size of the jump here is $\frac{3}{4}-\frac{1}{4}=\frac{1}{2}$ which is equal to $P_X(1)$. Also, note that the open and closed circles at point $x=1$ indicate that $F_X(1)=\frac{3}{4}$ and not $\frac{1}{4}$.
  
  Fig.3.3 - CDF for Example 3.9.

In general, let $X$ be a discrete random variable with range $R_X=\{x_1,x_2,x_3,...\}$, such that $x_1 < x_2 < x_3 < ...$ Here, for simplicity, we assume that the range $R_X$ is bounded from below, i.e., $x_1$ is the smallest value in $R_X$. If this is not the case then $F_X(x)$ approaches zero as $x \rightarrow -\infty$ rather than hitting zero. Figure 3.4 shows the general form of the CDF, $F_X(x)$, for such a random variable. We see that the CDF is in the form of a staircase. In particular, note that the CDF starts at $0$; i.e.,$F_X(-\infty)=0$. Then, it jumps at each point in the range. In particular, the CDF stays flat between $x_k$ and $x_{k+1}$, so we can write $$F_X(x)=F_X(x_k), \textrm{ for }x_k \leq x < x_{k+1}.$$

The CDF jumps at each $x_k$. In particular, we can write $$F_X(x_k)-F_X(x_k-\epsilon)=P_X(x_k), \textrm{ For $\epsilon>0$ small enough.}$$ Thus, the CDF is always a non-decreasing function, i.e., if $y \geq x$ then $F_X(y)\geq F_X(x)$. Finally, the CDF approaches $1$ as $x$ becomes large. We can write $$\lim_{x \rightarrow \infty} F_X(x)=1.$$

Fig.3.4 - CDF of a discrete random variable.

Note that the CDF completely describes the distribution of a discrete random variable. In particular, we can find the PMF values by looking at the values of the jumps in the CDF function. Also, if we have the PMF, we can find the CDF from it. In particular, if $R_X=\{x_1,x_2,x_3,...\}$, we can write $$F_X(x)=\sum_{x_k \leq x} P_X(x_k).$$ Now, let us prove a useful formula.

For all $a \leq b$, we have $$\hspace{50pt} P(a < X \leq b)=F_X(b)-F_X(a) \hspace{80pt} (3.1)$$

To see this, note that for $a \leq b$ we have $$P(X \leq b)=P(X \leq a) + P(a < X \leq b).$$ Thus, $$F_X(b)=F_X(a) + P(a < X \leq b).$$ Again, pay attention to the use of "$ < $" and "$\leq$" as they could make a difference in the case of discrete random variables. We will see later that Equation 3.1 is true for all types of random variables (discrete, continuous, and mixed). Note that the CDF gives us $P(X \leq x)$. To find $P(X < x)$, for a discrete random variable, we can simply write $$P(X < x)=P(X \leq x)-P(X=x)=F_X(x)-P_X(x).$$

Example
Let $X$ be a discrete random variable with range $R_X=\{1,2,3,...\}$. Suppose the PMF of $X$ is given by $$P_X(k)=\frac{1}{2^k} \textrm{ for } k=1,2,3,...$$

Find and plot the CDF of $X$, $F_X(x)$.
Find $P(2 < X \leq 5)$.
Find $P(X > 4)$.

Solution
- First, note that this is a valid PMF. In particular, $$\sum_{k=1}^{\infty} P_X(k)=\sum_{k=1}^{\infty} \frac{1}{2^k}=1 \textrm{ (geometric sum)}$$
  1. To find the CDF, note that
    
    $\textrm{For } x < 1,$ $F_X(x)=0$.
    
    $\textrm{For } 1\leq x < 2,$ $F_X(x)=P_X(1)=\frac{1}{2}$.
    
    $\textrm{For } 2\leq x < 3,$ $F_X(x)=P_X(1)+P_X(2)=\frac{1}{2}+ \frac{1}{4}=\frac{3}{4}$.
    
    In general we have $$\textrm{For } 0 < k \leq x < k+1,$$ $$F_X(x) =P_X(1)+P_X(2)+...+P_X(k)$$ $$=\frac{1}{2}+ \frac{1}{4}+...+\frac{1}{2^k}=\frac{2^k-1}{2^k}.$$
    Figure 3.5 shows the CDF of $X$.
    
    Fig.3.5 - CDF of random variable given in Example 3.10.
  2. To find $P(2 < X \leq 5)$, we can write $$P(2 < X \leq 5)=F_X(5)-F_X(2)=\frac{31}{32}-\frac{3}{4}=\frac{7}{32}.$$ Or equivalently, we can write $$P(2 < X \leq 5)=P_X(3)+P_X(4)+P_X(5)=\frac{1}{8}+\frac{1}{16}+\frac{1}{32}=\frac{7}{32},$$ which gives the same answer.
  3. To find $P(X > 4)$, we can write $$P(X > 4)=1-P(X \leq 4)=1-F_X(4)=1-\frac{15}{16}=\frac{1}{16}.$$