6 Exercise 1

  • Due date: April 22th (Monday) 11pm

Rules for Problem Sets

6.1 Update (as of 10am, April 18th)

  • Please calculate the standard deviation, not the variance in your simulation. The parameter you set when drawing the random number is the mean \(\mu\) and the standard deviation \(\sigma\) in normal distribution.
  • Use seq function to create the sequence of the sample sizes that you use in the simulation.

6.2 Question: Examine the law of large numbers through numerical simulations

Consider the random sample of \(\{x_{i}\}_{i=1}^{N}\) drawn from the random variable \(X\). The law of large numbers implies that

\[ \frac{1}{N}\sum_{i=1}^{N}x_{i}\overset{p}{\longrightarrow}E[X] \]

In other words, the sample mean converges to the population mean in probability as the sample size goes to infinity (i.e., \(N \rightarrow \infty\)).

Similary, the sample variance also converges to the population variance in probability

\[ \frac{1}{N}\sum_{i=1}^{N}(x_{i}-\bar{x})^{2}\overset{p}{\longrightarrow}V[X] \]

(Note on 4/18) This implies that the sample standard deviation also converges to the population standard deviation

(This is an application of the law of large numbers, though it is a bit involved to prove this.)

The goal of this problem set is to demonstrate these two properties through numerical simulations. Here is what we are going to do:

  1. For a certain sample size \(N\), draw \(N\) random numbers from the normal distribution with known mean and standard deviation.
  2. Calculate the sample mean and the sample variance for the “data” you draw.
  3. Repeat this for many different sample sizes.
  4. Examine to see whether the sample mean and standard deviation are getting closer to the true value, which you set when you draw the random numbers, as the sample size gets larger.

6.2.1 How to implement

I explain how to implement this in R step by step below.

  1. Prepare a function like this
    1. There are two inputs: (1) a vector that contains the data \(\{x_{i}\}_{i=1}^{N}\) and (2) the indicator of whether you calculate the mean or the standard deviation.
    fun_something = function(firstinput, secondinput){
    
      # Two inputs: firstinput, secondinput
      # One output: output
    
      # Do something.
    
      return(output)
    
    }
    1. Use if/else sentence. Example:
    # "secondinput" is the name of the input variable in your function
    if ( secondinput == "mean"){
      # calculate mean of the data (firstinput)
    
    } else if ( secondinput == "sd"){
      # Calculate standard deviation of the data (firstinput) 
    }    
    1. Use return function to define the output of the function.
  2. Construct a vector that contains the sample size you want to use in your simulation. For example:
samplesize_vec = seq(from = 100, to = 100000, by = 100)

Here, let’s try 100 different sample sizes that ranges from 100 to 100000.

  1. Prepare two vectors that contain the result in the forloop below. Since we are trying 100 different sample sizes, let’s create a vector with the length of 100.
# Hint:
# numeric(k) returns a zero vector with the length of k 
# length( vector) returns the length of `vector`

# result_mean = .... 
# result_sd = ....
  1. To create the random draw from the normal distribution, use below
# You can choose the mean and the standard deviation as you like.
rnorm(n = 100, mean = 2, sd = 5)
  1. Use forloop to calculate both mean and the standard deviation for each sample size. For example:
for (i in 1:length(samplesize_vec)){
  
  # Draw the random number
  
  # Calculate the mean using the function you construct. 
  
  # Calculate the standard deviation using the function you construct.
  
}
  1. Plot the result with ggplot2.
    1. Install the package if you have not done it yet.
    2. Load ggplot2 by library(ggplot2)
    3. Use qplot command to make a figure
    # Create plot and save it as the variable `plot1`
    plot1 <- qplot(x = samplesize_vec, y = yourresult, geom = "line")
    
    # print "plot1"
    print(plot1)
    
    # save the plot as PNG file
    ggsave(file = "filename.png", plot = plot1)

6.2.2 What to submit

Your answer should include

  1. The true value of mean and variance you choose in your simulation.
  2. The plot that describes the relationship between the sample mean (variance) and the sample size.
  3. Explain what the plots from your simulation indicate.