Assignment 4 T Tests Tables

Reading Assignment
An Introduction to Statistical Methods and Data Analysis, (See Course Schedule).
Two Methods for Making a Statistical Decision
There are two approaches for making a statistical decision regarding a null hypothesis. One is the rejection region approach and the second is the pvalue (or probability value) approach. Of the two methods, the latter is more commonly used and provided in published literature. However, understanding the rejection region approach can go a long way in one's understanding of the pvalue method. Regardless of method applied, the conclusions from the two approaches are exactly the same. In explaining these processes in this section of the lesson, we will build upon the prior steps already discussed (i.e. setting up hypotheses, stating the level of significance α, and calculating the appropriate test statistic).
Let's start out here by having Dr. Wiesner walk through a comparison of the pvalue approach with the rejection region approach to hypothesis testing.
Test statistic: The sample statistic one uses to either reject H_{o} (and conclude H_{a}) or not to reject H_{o}.
Critical values: The values of the test statistic that separate the rejection and nonrejection regions.
Rejection region: the set of values for the test statistic that leads to rejection of H_{o}.
Nonrejection region: the set of values not in the rejection region that leads to nonrejection of H_{o}.
Pvalue: The pvalue (or probability value) is the probability that the test statistic equals the observed value or a more extreme value under the assumption that the null hypthothesis is true.
As mentioned previously in this lesson, the logic of hypothesis testing is to reject the null hypothesis if the sample data are not consistent with the null hypothesis. Thus, one rejects the null hypothesis if the observed test statistic is more extreme in the direction of the alternative hypothesis than one can tolerate. The critical values are the boundary values obtained corresponding to the preset α level.
Steps in a Conducting a Hypothesis Test
Although we listed these at the beginning of the lesson, we reiterate them here for convenience plus we are building on them.
Step 1. Check the conditions necessary to run the selected test and select the hypotheses for that test.:
 If Ztest for one proportion: \(np_0 \geq 5\) and \(n(1  p_0) \geq 5\)
 If a ttest for one mean: either the data comes from an approximately normal distribution or the sample size is at least 30. If neither, then the data is not heavily skewed and without outliers.
If One Proportion Ztest:
Twotailed Righttailed Lefttailed \(H_0 : p = p_0\) OR
\(H_0 : p = p_0\) OR
\(H_0 : p = p_0\) \(H_a : p \ne p_0\) \(H_a : p > p_0\) \(H_a : p< p_0\)
If One Mean ttest
Twotailed  Righttailed  Lefttailed  
\(H_0 : \mu = \mu_0\)  OR  \(H_0 : \mu = \mu_0\)  OR  \(H_0 : \mu = \mu_0\) 
\(H_a : \mu \ne \mu_0\)  \(H_a : \mu > \mu_0\)  \(H_a : \mu < \mu_0\) 
Step 2. Decide on the significance level, \(\alpha\).
Step 3. Compute the value of the test statistic:
If One Proportion Ztest: \(Z^{*}=\frac{\hat{p}p_0}{\sqrt{\frac{p_0(1p_0)}{n}}}\)
If One Mean ttest: \(t^{*} = \frac{\bar{x}\mu_0}{S/\sqrt{n}}\)
Rejection Region Approach to Hypothesis Testing
Step 4. Find the appropriate critical values for the tests using the Ztable for test of one proportion, or the ttable if a test for one mean. REMEMBER: for the one mean test the degrees for freedom are the sample size minus one (i.e. n  1). Write down clearly the rejection region for the problem.
One Proportion Ztest One Mean ttest
TwoTailed
Reject \(H_0\) if \(Z^* \geq Z_{\alpha/2}\)
TwoTailed
Reject \(H_0\) if \(t^* \geq t_{\alpha/2}\)
LeftTailed
Reject \(H_0\) if \(Z^* \leq Z_{\alpha}\)
LeftTailed
Reject \(H_0\) if \(t^* \leq t_{\alpha}\)
RightTailed
Reject \(H_0\) if \(Z^* \geq Z_{\alpha}\)
RightTailed
Reject \(H_0\) if \(t^* \geq t_{\alpha}\)
Step 5. Check to see if the value of the test statistic falls in the rejection region. If it does, then reject \(H_0\) (and conclude \(H_a\)). If it does not fall in the rejection region, do not reject \(H_0\).
Step 6. State the conclusion in words.
Pvalue Approach to Hypothesis Testing
Steps 1 Step 3. The first few steps (Step 0  Step 3) are exactly the same as the rejection region approach.
Step 4. In Step 4, we need to compute the appropriate pvalue based on our alternative hypothesis:
If \(H_a\) is righttailed, then the pvalue is the probability the sample data produces a value equal to or greater than the observed test statistic.
If \(H_a\) is lefttailed, then the pvalue is the probability the sample data produces a value equal to or less than the observed test statistic.
If \(H_a\) is twotailed, then the pvalue is two times the probability the sample data produces a value equal to or greater than the absolute value of the observed test statistic.
Righttailed  Lefttailed  Twotailed  
_{\(P(Z > Z*)\)}  OR  _{\(P(Z < Z*)\)}  OR  _{\(2 \times P(Z > Z*)\)} 
_{\(P(t > t*)}_{\) }_{at df = n1}  _{\(P(t < t*)\)}_{ at df = n1}  _{\(2 \times P(t > t*)\) }_{at df = n1} 
Step 5. Check to see if the pvalue is less than the stated alpha value. If it is, then reject \(H_0\) (and conclude \(H_a\)). If it is not less than alpha, do not reject \(H_0\).
Step 6. Conclusion in words.
Here is Dr. Wiesner working through an example that will help you understand what pvalue is:
Example: Penn State Students from Pennsylvania
Continuing with our oneproportion example at the beginning of this lesson, say we take a random sample of 500 Penn State students and find that 278 are from Pennsylvania. Can we conclude that the proportion is larger than 0.5 at a 5% level of significance?
A: Using the Rejection Region Approach
Step 1. Can we use the oneproportion ztest?
The answer is yes since the hypothesized value \(p_0\) is 0.5 and we can check that:
\(np_0 = 500 \times 0.5 = 250 \geq 5\),
\(n(1  p_0) = 500 \times (1  0.5) = 250 \geq 5\).Set up the hypotheses. Since the research hypothesis is to check whether the proportion is greater than 0.5 we set it up as a one(right)tailed test:
\(H_0: p = 0.5\)
\(H_a: p > 0.5\)Step 2. Decide on the significance level, \(\alpha\).
According to the question, \(\alpha\) = 0.05.
Step 3. Compute the value of the test statistic:
\[\begin{align} Z^{*} &= \frac{\hat{p}p_0}{\sqrt{\frac{p_0 (1p_0)}{n}}}\\
&=\frac{0.5560.5}{\sqrt{\frac{0.5 \cdot (10.5)}{500}}}\\
&=2.504\\
\end{align}\]Step 4. Find the appropriate critical values for the test using the ztable. Write down clearly the rejection region for the problem. We can use the standard normal table or the last row of our ttable to find the value of Z_{0.05} since that last row for df = \(\infty\) (infinite) refers to the zvalue.
From the table, \(Z_0.05\) is found to be 1.645 and thus the critical value is 1.645. The rejection region for the righttailed test is given by:
\(Z* > 1.645\)
Step 5. Check whether the value of the test statistic falls in the rejection region. If it does, then reject \(H_0\) (and conclude \(H_a\)). If it does not fall in the rejection region, do not reject \(H_0\).
The observed Zvalue is 2.504  this is our test statistic. Since Z* falls within the rejection region, we reject \(H_0\).
Step 6. State the conclusion in words.
With a test statistic of 2.504 and critical value of 1.645 at a 5% level of significance, we have enough statistical evidence to reject the null hypothesis. We conclude that a majority of the students are from Pennsylvania.
B: Using the Pvalue Approach
Steps 1 Step 3. The first few steps (Step 1  Step 3) are exactly the same as the rejection region approach.
Step 4. In Step 4, we need to compute the appropriate pvalue based on our alternative hypothesis. With our alternative hypothesis being righttailed:
\[\begin{align}\\
pvalue &= P(Z > Z^{*})\\
&= P \left( Z > \left\frac {\hat{p}p_0}{\sqrt{\frac{p_0 (1p_0)}{n}}}\right \right) \\
&= P \left( Z > \left\frac{0.5560.5}{\sqrt{\frac{0.5(10.5)}{500}}}\right \right) \\
&= P(Z > 2.50)\\
&=0.0062\\
\end{align}\]Step 5. Since pvalue = 0.0062 < 0.05 (the α value), we reject the null hypothesis.
Step 6. Conclusion in words:
With a test statistic of 2.504 and pvalue of 0.0062, we reject the null hypothesis at a 5% level of significance. We conclude that a majority of the students are from Pennsylvania.
Example: Length of Lumber
Continuing with our one mean lumber example from the beginning of this lesson, the mean length of the lumber is supposed to be 8.5 feet. A builder wants to check whether the shipment of lumber she receives has a mean length different from 8.5 feet. If the builder observes that the sample mean of 61 pieces of lumber is 8.3 feet with a sample standard deviation of 1.2 feet. What will she conclude? Conduct this test at a 1% level of significance.
A: Using the Rejection Region Approach
Step 1. Can we use the onemean ttest?
The answer is yes since the sample size of 61 is sufficiently large (greater than 30):
Set up the hypotheses (since the research hypothesis is to check whether the proportion is different from 0.25, we set it up as a twotailed test):
\(H_0: \mu = 8.5\)
\(H_a: \mu \ne 8.5\)
Step 2. Decide on the significance level, \(\alpha\).
According to the question, \(\alpha\) = 0.01.
Step 3. Compute the value of the test statistic:
\[t^{*} = \frac{\bar{x}\mu_0}{S/\sqrt{n}}=\frac{8.38.5}{1.2/\sqrt{61}}=1.3\]
Step 4. Find the appropriate critical values for the test using the ttable. Write down clearly the rejection region for the problem.
From the table and with degrees of freedom of 60 from 61  1, that the critical value at \(t_{\alpha/2} = t_{0.005}\) is found to be 2.660 and thus the critical value 2.660. The rejection region for the twotailed test is given by:
\(t* <  2.660\), or \(t* > 2.660\)
Step 5. Check whether the value of the test statistic falls in the rejection region. If it does, then reject \(H_0\) (and conclude \(H_a\)). If it does not fall in the rejection region, do not reject \(H_0\).
The observed tvalue is 1.3  this is our test statistic. Since t* does not fall within the rejection region, we fail to reject \(H_0\).
Step 6. State the conclusion in words.
With a test statistic of 1.3 and critical value of ± 2.660 at a 1% level of significance, we do not have enough statistical evidence to reject the null hypothesis. We conclude that there is not enough statistical evidence that indicates that the mean length of lumber differs from 8.5 feet.
B: Using the Pvalue Approach
Steps 1 Step 3. The first few steps (Step 1  Step 3) are exactly the same as the rejection region approach.
Step 4. In Step 4, we need to compute the appropriate pvalue based on our alternative hypothesis:
\[\begin{align}pvalue &= 2\times P(t > t^{*}) \\
&= 2\times P \left(t > \left\frac{\bar{x}\mu_0}{S/\sqrt{n}}\right \right) \\
&= 2\times P \left(t > \left\frac{8.38.5}{1.2/\sqrt{61}}\right \right) \\
&= 2\times P(t > 1.3) \\
&= 2\times P(t > 1.3)\\
\end{align}\]
Step 5. From the ttable going across the row for 60 degrees of freedom, we do not find a value equal to 1.3. Without software to find a more exact probability, the best we can do from the ttable is find a range. We do see that the value falls between 1.296 and 1.671. These two tvalues correspond to righttail probabilies of 0.1 and 0.05, respectively. Since 1.3 is between these two tvalues, then it stands to reason that the probability to the right of 1.3 would fall between 0.05 and 0.1. Therefore, the pvalue would be = \(2\times (0.05)\) and \(0.1)\) or from 0.1 to 0.2 With this range of possible pvalues exceeding our 1% level of signficance for the test, we fail to reject the null hypothesis.
Step 6. Conclusion in words:
With a test statistic of  1.3 and pvalue between 0.1 to 0.2, we fail to reject the null hypothesis at a 1% level of significance since the pvalue would exceed our significance level. We conclude that there is not enough statistical evidence that indicates that the mean length of lumber differs from 8.5 feet.
Comparing the PValue Approach to the Rejection Region Approach
Both approaches will ensure the same conclusion and either one will work. However, using the pvalue approach has the following advantages:
 Using the rejection region approach, you need to check the table for the critical value every time people give you a different α value.
 In addition to just using it to reject or not reject \(H_0\) by comparing pvalue to α value, pvalue also gives us some idea of the strength of the evidence against \(H_0\).

Reading Assignment
An Introduction to Statistical Methods and Data Analysis, (see Course Schedule).
Inferences About the Difference Between Two Population Means for Paired Data
Paired samples: The sample selected from the first population is related to the corresponding sample from the second population.
It is important to distinguish independent samples and paired samples. Some examples are given as follows.
Compare the time that males and females spend watching TV.
Think about the following, then click on the icon to the left to compare your answers.
A. We randomly select 20 males and 20 females and compare the average time they spend watching TV. Is this an independent sample or paired sample?
B. We randomly select 20 couples and compare the time the husbands and wives spend watching TV. Is this an independent sample or paired sample?
The paired ttest will be used when handling hypothesis testing for paired data.
The Paired tProcedure
Assumptions:
 Paired samples
 The differences of the pairs follow a normal distribution or the number of pairs is large (note here that if the number of pairs is < 30, we need to check whether the differences are normal, but we do not need to check for the normality of each population)
Hypothesis:
\(H_0: \mu_d = 0\)
\(H_a: \mu_d \ne 0\)OR
\(H_0: \mu_d = 0\)
\(H_a: \mu_d < 0\)OR
\(H_0: \mu_d = 0\)
\(H_a: \mu_d > 0\)tstatistic:
Let d = differences between the pairs of data, then \(\bar{d}\) = mean of these differences.
The test statistics is: \(t^{*}=\frac{\bar{d}0}{{s_d }/\sqrt{n}}\)
degrees of freedom = n  1
where n denotes the number of pairs or the number of differences.Paired tinterval:
\[\bar{d}\pm t_{\alpha/2} \cdot \frac{s_d}{\sqrt{n}}\]
Note: \(s_{\bar{d}=\frac{s_d}{\sqrt{n}}}\) where \(s_{\bar{d}}\) is the standard deviation of the sample differences.
Example: Drinking Water
Trace metals in drinking water affect the flavor and an unusually high concentration can pose a health hazard. Ten pairs of data were taken measuring zinc concentration in bottom water and surface water (zinc_conc.txt).
Does the data suggest that the true average concentration in the bottom water exceeds that of surface water?
Location  
1  2  3  4  5  6  7  8  9  10  
Zinc concentration in bottom water  .430  .266  .567  .531  .707  .716  .651  .589  .469  .723 
Zinc concentration in surface water  .415  .238  .390  .410  .605  .609  .632  .523  .411  .612 
To perform a paired ttest for the previous trace metal example:
Assumptions:
1. Is this a paired sample?  Yes.
2. Is this a large sample?  No.
3. Since the sample size is not large enough (less than 30), we need to check whether the differences follow a normal distribution.
In Minitab, we can use Calc > calculator to obtain diff = bottom  surface and then perform a probability plot on the differences.
Thus, we conclude that the difference may come from a normal distribution.
Step 1. Set up the hypotheses:
\(H_0: \mu_d = 0\)
\(H_a: \mu_d > 0\)where 'd' is defined as the difference of bottom  surface.
Step 2. Write down the significance level \(\alpha = 0.05\).
Step 3. What is the critical value and the rejection region?
\(\alpha = 0.05\), df = 9
\(t_{0.05} = 1.833\)
rejection region: \( t > 1.833\)
Step 4. Compute the value of the test statistic:
\[t^{*}=\frac{\bar{d}}{\frac{s_d }{\sqrt{n}}}=\frac{0.0804}{\frac{0.0523}{\sqrt{10}}}=4.86\]
Step 5. Check whether the test statistic falls in the rejection region and determine whether to reject H_{o}.
\(t^* = 4.86 > 1.833\)
reject \(H_0\)
Step 6. State the conclusion in words.
At \(\alpha = 0.05\), we conclude that, on average, the bottom zinc concentration is higher than the surface zinc concentration.
Using Minitab to Perform a Paired tTest
You can used a paired ttest in Minitab to perform the test. Alternatively, you can perform a 1sample ttest on difference = bottom  surface.
1. Stat > Basic Statistics > Paired t
2. Click 'Options' to specify the confidence level for the interval and the alternative hypothesis you want to test. The default null hypothesis is 0.
The Minitab output for paired T for bottom  surface is as follows:
Paired T for bottom  surface
N
Mean
StDev
SE Mean
bottom 10
0.5649
0.1468
0.0464
surface 10
0.4845
0.1312
0.0415
Difference 10
0.0804
0.0523
0.0165
95% lower bound for mean difference: 0.0505
TTest of mean difference = 0 (vs > 0): TValue = 4.86 PValue = 0.000Note: In Minitab, if you choose a lowertailed or an uppertailed hypothesis test, an upper or lower confidence bound will be constructed, respectively, rather than a confidence interval.
Click on the 'Minitab Movie' icon to display a walk through of 'Conducting a Paired tTest'.
Using the pvalue to draw a conclusion about our example:
pvalue = 0.000 < 0.05
Reject \(H_0\) and conclude that bottom zinc concentration is higher than surface zinc concentration.
Note: For the zinc concentration problem, if you do not recognize the paired structure, but mistakenly use the 2sample ttest treating them as independent samples, you will not be able to reject the null hypothesis. This demonstrates the importance of distinguishing the two types of samples. Also, it is wise to design an experiment efficiently whenever possible.
What if the assumption of normality is not satisfied? In this case we would use a nonparametric 1sample test on the difference.
0 thoughts on “Assignment 4 T Tests Tables”