STAT 503 – Statistical Methods for Biology
Homework 5 Key
30 Points. Due at 11:59 PM on Friday, October 28, 2022
Please use complete sentences unless the question is marked with an asterisk (*).
Round answers to 3 decimal places
Please show the key process of calculation.
[I have included > 3 decima places in some answers.
Give full credit if students' answers are off due to a rounding error.]
1. [Total: 9 points] One of the great discoveries of biology is that organisms have a class of
genes called “regulatory genes,” whose only job is to regulate the activity of other genes.
How many genes does the typical regulatory gene regulate? A study of interaction networks
in yeast (S. cerevesiae) came up with the following data for 109 regulatory genes (Guelzim et
al. 2002).
Using R to do this problem: First imported the data #17 “Regulator_gene_data” in
Brightspace, under the “Data for R” module, then answer the following questions:
Number of genes regulated Frequency
1 20
2 10
3 7
4 7
5 8
6 8
7 5
8 2
9 4
10 4
11 3
12 4
13 5
14 1
15 2
16 1
17 3
18 2
19 2
20 3
22 3
This study source was downloaded by 100000857965443 from CourseHero.com on 07-14-2023 15:21:32 GMT -05:00
https://www.coursehero.com/file/183764411/stat503-Fall2022-hw5-keypdf/
STAT 503 – Statistical Methods for Biologists Modified: 2012-10-30
2
25 1
26 1
28 1
29 1
37 1
a. What type of graph should be used to display these data? [1 point]
A histogram or cumulative frequency distribution.
b. What is the estimated mean number of genes regulated by a regulatory gene in the yeast
genome? (Hint: Using the mutate function in tutorial 5. [2 points,1 point for calculation,
i.e. students show which formula they used, and 1 point for R coding]
E(X)=∑????????????(????????)) =8.3 genes.
Sample R codes: [Students’ code might be different from mine, but they should get the same
results if it is correct]
rm(list=ls())
library(dplyr)
genepath<-file.choose()
gene<-read.csv(genepath)
reg_gene<-mutate(gene,p_x=gene$count/109)
reg_gene
Ex<-sum(reg_gene$Number.of.genes.regulated*reg_gene$p_x)
Ex
c. What is the standard error of the mean if the variance of the regulatory gene in the yeast
genome is 56.508? [1 point]
SE(????̅)=√????????????(????)
√????
[0.5 pts] =
√56.508
√109
≈0.720 genes[0.25 for unit,0.25 correct value]
Or R codes: sqrt(56.508)/sqrt(109)
d. Explain what this standard error measures. [1 point]
The standard error is 0.072 genes. It explains the spread of the sampling distribution of the mean
number of genes regulated.
e. What assumption are you making in part (c)? [1 point]
That we have a simple random sample of the total population of regulatory genes. Here
we do not require the underlying population distribution is normal since the sample size is
sufficient large with n=109, so CLT applied here.
This study source was downloaded by 100000857965443 from CourseHero.com on 07-14-2023 15:21:32 GMT -05:00
https://www.coursehero.com/file/183764411/stat503-Fall2022-hw5-keypdf/
STAT 503 – Statistical Methods for Biologists Modified: 2012-10-30
3
f. Find a 95% confidence interval for the population mean. [2 points]
C=0.95, ???? = 1 − ???? = 0.05,
????
2
= 0.025[0.25pts]
Therefore, by Z table or R, ????????⁄2 = 1.96[0.5pts]
????̅± ????????⁄2
????
√????
[0.5pts] = 8.3 ±1.96*0.72=8.3±1.418≈ (6.889,9.711) genes [0.5pts for two
bounds, 0.25 pts for unit]
Or R codes:
> C <- 0.95
> sigma<-sqrt(56.508)
> sigma
[1] 7.51718
> xbar <- 8.3
> n <- 109
> z<- qnorm((1-C)/2,lower.tail = FALSE)
> z
[1] 1.959964
> c(xbar-z*sigma/sqrt(n),xbar+z*sigma/sqrt(n))
[1] 6.888796 9.711204
Category | exam bundles |
Comments | 0 |
Rating | |
Sales | 0 |