第一小题:
要求:在R环境中完成下述操作,并写出具体R代码。
- 查看R当前工作目录,设置R工作目录为数据所在目录并查看该目录下的文件;
- 将数据homework3_data. csv导入到R中;
- 查看行数、列数、前3行数据以及数据类型;
- 对数据中的测量值进行描述统计并绘制箱线图;
- 下载并安装R包pwr,查看帮助文档了解用法。
解答:
#设置当前的工作目录
> setwd("/home/kevin/Desktop/biostatistics/2018 homework/homework3")
#查看该目录的文件
> dir()
[1] "2018hw3.docx"
[2] "2018hw3.pdf"
[3] "homework3_data.csv"
# 通过read.table()将数据导入R中
> mydataframe <- read.table("homework3_data.csv",header = T, sep = ",")
(P.S. header = T 表示默认文件的第一行为标题, sep = "," 表示CSV文件中数据是通过逗号分割,可以通过?read.table()查看帮助文档)
#查看homework3_data.csv文件内容
> mydataframe
Individual_ID Birthweight
1 1 128
2 2 103
3 3 120
4 4 125
5 5 110
6 6 140
7 7 131
8 8 124
9 9 146
10 10 121
11 11 137
12 12 145
13 13 111
14 14 133
15 15 117
16 16 114
17 17 136
18 18 126
19 19 113
20 20 120
# 查看前三行的内容
> head(mydataframe,n = 3)
Individual_ID Birthweight
1 1 128
2 2 103
3 3 120
# 查看数据的维度
> dim(mydataframe)
[1] 20 2
# 查看数据的类别
> class(mydataframe)
[1] "data.frame"
# 对数据中的测量值进行描述
> summary(mydataframe$Birthweight)
Min. 1st Qu. Median Mean 3rd Qu. Max.
103.0 116.2 124.5 125.0 133.8 146.0

> install.packages("pwr")
Installing package into ‘/home/kevin/R/x86_64-pc-linux-gnu-library/3.4’
(as ‘lib’ is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/pwr_1.2-2.tar.gz'
Content type 'application/x-gzip' length 93810 bytes (91 KB)
==================================================
downloaded 91 KB
* installing *source* package ‘pwr’ ...
** package ‘pwr’ successfully unpacked and MD5 sums checked
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (pwr)
The downloaded source packages are in
‘/tmp/RtmpFIzqTj/downloaded_packages’
# 加载pwr包
> library(pwr)
# 查看pwr包的帮助文档
> ?pwr()
第二小题:
Suppose we draw a sample of size 20 of birthweights from a hospital, the mean of national wide birthweights is 122.
Please list the <u>formulas</u> to calculate this and also the R code for each questions.
- If the standard deviation of the national wide birthweights is 12, what is the probability that the mean birthweight of the sample falls between 100.0 and 126.0?
Now, give the details of the 20 samples which can be found in the homework3_data.csv.
-
What is the 95% confidence interval of the sample mean?
-
Can we say the underlying mean birthweight from this hospital is higher than the national average?
-
Test the hypothesis that the mean birthweight of sample size 20 is different from the national average (Significance level 0.05).
-
Compute the power of the test performed in (4) with significance level 0.05.
-
To see the significance difference between the sample mean and the national mean and ensure the type II error to be β=0.05, what is the appropriate sample size with significance level is 0.01?
解答:

> n <- 20
> u <- 122
> sd <- 12
# 按照公式进行计算
> p_lower <- pnorm((100-u)/(sd/sqrt(n)))
> p_upper - p_lower
[1] 0.9319814
> p_upper <- pnorm((126-u)/(sd/sqrt(n)))
> p_lower <- pnorm((100-u)/(sd/sqrt(n)))
> p_upper - p_lower
[1] 0.9319814
# 所以样本值落在100到126之间的概率为 0.9319814

# 读取数据并按照公式输入
> m <- mean(mydataframe$Birthweight)
> left<-m-qt(0.975,n-1)*sd/sqrt(n)
> right<-m+qt(0.975,n-1)*sd/sqrt(n)
> left; right
[1] 119.3838
[1] 130.6162
# So the 95% confidence interval of the sample mean is [119.3838, 130.6162]

# 使用单侧 t 检验
> t.test(mydataframe$Birthweight,alternative = "greater", mu = 122)
One Sample t-test
data: mydataframe$Birthweight
t = 1.1168, df = 19, p-value = 0.139
alternative hypothesis: true mean is greater than 122
95 percent confidence interval:
120.3552 Inf
sample estimates:
mean of x
125
#P > 0.05
#So the hypothesis that the underlying mean birthweight from this hospital is higher thanthe national average is FALSE

> power<-pt(qt(0.025, n-1)+abs(m-122)/sd*sqrt(n),n-1)
> power
[1] 0.170908
#The power of the test performed in (4) with significance level 0.05 is 0.170908

#根据公式进行计算
> n<-(qnorm(1-0.05)+qnorm(1-0.01/2))^2*sd^2/(m-122)^2
> n
[1] 285.0266
#使用pwr包进行计算
> pwr.t.test(d=abs(m-122)/sd,sig.level = 0.01, power = 0.95, type = "one.sample", alternative = "two.sided")
One-sample t test power calculation
n = 288.3538
d = 0.25
sig.level = 0.01
power = 0.95
alternative = two.sided
#so n=288
网友评论