【R<-练习】生物统计学练习题1

作者: 莫讠 | 来源:发表于2019-05-13 20:19 被阅读0次

【R<-练习】生物统计学练习题1
【R<-练习】生物统计学练习题3
2019-04-19
【R<-练习】生物统计练习题2
2018-12-01
2019-04-13【R语言练习题-初级】作业(ing）
【R->练习】生物统计学练习4
python练习题
《Advanced R 》R练习题
2019-03-22 属性权限和定时任务课后练习题

第一小题：
要求：在R环境中完成下述操作，并写出具体R代码。

查看R当前工作目录，设置R工作目录为数据所在目录并查看该目录下的文件；
将数据homework3_data. csv导入到R中；
查看行数、列数、前3行数据以及数据类型；
对数据中的测量值进行描述统计并绘制箱线图;
下载并安装R包pwr，查看帮助文档了解用法。

解答：

#设置当前的工作目录
> setwd("/home/kevin/Desktop/biostatistics/2018 homework/homework3")
#查看该目录的文件
> dir()
[1] "2018hw3.docx"   
[2] "2018hw3.pdf"
[3] "homework3_data.csv"

# 通过read.table()将数据导入R中
> mydataframe <- read.table("homework3_data.csv",header = T, sep = ",")

（P.S. header = T 表示默认文件的第一行为标题， sep = "," 表示CSV文件中数据是通过逗号分割，可以通过？read.table()查看帮助文档）

#查看homework3_data.csv文件内容
> mydataframe
   Individual_ID Birthweight
1              1         128
2              2         103
3              3         120
4              4         125
5              5         110
6              6         140
7              7         131
8              8         124
9              9         146
10            10         121
11            11         137
12            12         145
13            13         111
14            14         133
15            15         117
16            16         114
17            17         136
18            18         126
19            19         113
20            20         120

# 查看前三行的内容
> head(mydataframe,n = 3)
  Individual_ID Birthweight
1             1         128
2             2         103
3             3         120
# 查看数据的维度
> dim(mydataframe)
[1] 20  2
# 查看数据的类别
> class(mydataframe)
[1] "data.frame"

# 对数据中的测量值进行描述
> summary(mydataframe$Birthweight)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  103.0   116.2   124.5   125.0   133.8   146.0

Boxplot图.

> install.packages("pwr")
Installing package into ‘/home/kevin/R/x86_64-pc-linux-gnu-library/3.4’
(as ‘lib’ is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/pwr_1.2-2.tar.gz'
Content type 'application/x-gzip' length 93810 bytes (91 KB)
==================================================
downloaded 91 KB

* installing *source* package ‘pwr’ ...
** package ‘pwr’ successfully unpacked and MD5 sums checked
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (pwr)

The downloaded source packages are in
    ‘/tmp/RtmpFIzqTj/downloaded_packages’
# 加载pwr包
> library(pwr)
# 查看pwr包的帮助文档
> ?pwr()

第二小题：
Suppose we draw a sample of size 20 of birthweights from a hospital, the mean of national wide birthweights is 122.

Please list the <u>formulas</u> to calculate this and also the R code for each questions.

If the standard deviation of the national wide birthweights is 12, what is the probability that the mean birthweight of the sample falls between 100.0 and 126.0?

Now, give the details of the 20 samples which can be found in the homework3_data.csv.

What is the 95% confidence interval of the sample mean?
Can we say the underlying mean birthweight from this hospital is higher than the national average?
Test the hypothesis that the mean birthweight of sample size 20 is different from the national average (Significance level 0.05).
Compute the power of the test performed in (4) with significance level 0.05.
To see the significance difference between the sample mean and the national mean and ensure the type II error to be β=0.05, what is the appropriate sample size with significance level is 0.01?

解答：

公式

> n <- 20
> u <- 122
> sd <- 12
# 按照公式进行计算
> p_lower <- pnorm((100-u)/(sd/sqrt(n)))
> p_upper - p_lower
[1] 0.9319814
> p_upper <- pnorm((126-u)/(sd/sqrt(n)))
> p_lower <- pnorm((100-u)/(sd/sqrt(n)))
> p_upper - p_lower
[1] 0.9319814 

# 所以样本值落在100到126之间的概率为 0.9319814

公式

# 读取数据并按照公式输入
> m <- mean(mydataframe$Birthweight)
> left<-m-qt(0.975,n-1)*sd/sqrt(n)
> right<-m+qt(0.975,n-1)*sd/sqrt(n)
> left; right
[1] 119.3838
[1] 130.6162

# So the 95% confidence interval of the sample mean is [119.3838, 130.6162]

公式

# 使用单侧 t 检验
> t.test(mydataframe$Birthweight,alternative = "greater", mu = 122)

    One Sample t-test

data:  mydataframe$Birthweight
t = 1.1168, df = 19, p-value = 0.139
alternative hypothesis: true mean is greater than 122
95 percent confidence interval:
 120.3552      Inf
sample estimates:
mean of x 
      125 

#P > 0.05
#So the hypothesis that the underlying mean birthweight from this hospital is higher thanthe national average is FALSE

Screenshot from 2019-05-13 21-28-04.png

> power<-pt(qt(0.025, n-1)+abs(m-122)/sd*sqrt(n),n-1)
> power
[1] 0.170908

#The power of the test performed in (4) with significance level 0.05 is 0.170908

Screenshot from 2019-05-13 21-55-44.png

#根据公式进行计算
> n<-(qnorm(1-0.05)+qnorm(1-0.01/2))^2*sd^2/(m-122)^2
> n
[1] 285.0266


#使用pwr包进行计算
> pwr.t.test(d=abs(m-122)/sd,sig.level = 0.01, power = 0.95, type = "one.sample", alternative = "two.sided")

     One-sample t test power calculation 

              n = 288.3538
              d = 0.25
      sig.level = 0.01
          power = 0.95
    alternative = two.sided

#so n=288