6 Exercise 1
- Due date: April 22th (Monday) 11pm
Rules for Problem Sets
- If you are enrolled in Japanese class (i.e., Wednesday 2nd), you can use both Japanese and English to write your answer.
- Submit your solution through
CourseN@vi
. - Submit both your answer and R script.
- Using
Rmarkdown
would be appreciated, though not mandatory.Rmarkdown
introduction in Japanese: https://kazutan.github.io/kazutanR/Rmd_intro.htmlRmarkdown
introduction in English: https://rmarkdown.rstudio.com/articles_intro.html
- I might cover
Rmarkdown
in the course later.
6.1 Update (as of 10am, April 18th)
- Please calculate the standard deviation, not the variance in your simulation. The parameter you set when drawing the random number is the mean \(\mu\) and the standard deviation \(\sigma\) in normal distribution.
- Use
seq
function to create the sequence of the sample sizes that you use in the simulation.
6.2 Question: Examine the law of large numbers through numerical simulations
Consider the random sample of \(\{x_{i}\}_{i=1}^{N}\) drawn from the random variable \(X\). The law of large numbers implies that
\[ \frac{1}{N}\sum_{i=1}^{N}x_{i}\overset{p}{\longrightarrow}E[X] \]
In other words, the sample mean converges to the population mean in probability as the sample size goes to infinity (i.e., \(N \rightarrow \infty\)).
Similary, the sample variance also converges to the population variance in probability
\[ \frac{1}{N}\sum_{i=1}^{N}(x_{i}-\bar{x})^{2}\overset{p}{\longrightarrow}V[X] \]
(Note on 4/18) This implies that the sample standard deviation also converges to the population standard deviation
(This is an application of the law of large numbers, though it is a bit involved to prove this.)
The goal of this problem set is to demonstrate these two properties through numerical simulations. Here is what we are going to do:
- For a certain sample size \(N\), draw \(N\) random numbers from the normal distribution with known mean and standard deviation.
- Calculate the sample mean and the sample variance for the “data” you draw.
- Repeat this for many different sample sizes.
- Examine to see whether the sample mean and standard deviation are getting closer to the true value, which you set when you draw the random numbers, as the sample size gets larger.
6.2.1 How to implement
I explain how to implement this in R
step by step below.
- Prepare a function like this
- There are two inputs: (1) a vector that contains the data \(\{x_{i}\}_{i=1}^{N}\) and (2) the indicator of whether you calculate the mean or the standard deviation.
fun_something = function(firstinput, secondinput){ # Two inputs: firstinput, secondinput # One output: output # Do something. return(output) }
- Use if/else sentence. Example:
# "secondinput" is the name of the input variable in your function if ( secondinput == "mean"){ # calculate mean of the data (firstinput) } else if ( secondinput == "sd"){ # Calculate standard deviation of the data (firstinput) }
- Use
return
function to define the output of the function.
- Construct a vector that contains the sample size you want to use in your simulation. For example:
samplesize_vec = seq(from = 100, to = 100000, by = 100)
Here, let’s try 100 different sample sizes that ranges from 100 to 100000.
- Prepare two vectors that contain the result in the
forloop
below. Since we are trying 100 different sample sizes, let’s create a vector with the length of 100.
# Hint:
# numeric(k) returns a zero vector with the length of k
# length( vector) returns the length of `vector`
# result_mean = ....
# result_sd = ....
- To create the random draw from the normal distribution, use below
# You can choose the mean and the standard deviation as you like.
rnorm(n = 100, mean = 2, sd = 5)
- Use
forloop
to calculate both mean and the standard deviation for each sample size. For example:
for (i in 1:length(samplesize_vec)){
# Draw the random number
# Calculate the mean using the function you construct.
# Calculate the standard deviation using the function you construct.
}
- Plot the result with
ggplot2
.- Install the package if you have not done it yet.
- Load
ggplot2
bylibrary(ggplot2)
- Use
qplot
command to make a figure
# Create plot and save it as the variable `plot1` plot1 <- qplot(x = samplesize_vec, y = yourresult, geom = "line") # print "plot1" print(plot1) # save the plot as PNG file ggsave(file = "filename.png", plot = plot1)
6.2.2 What to submit
Your answer should include
- The true value of mean and variance you choose in your simulation.
- The plot that describes the relationship between the sample mean (variance) and the sample size.
- Explain what the plots from your simulation indicate.