ADVANCED STATISTICS DEMYSTIFIED Demystiﬁed Series Advanced Statistics Demystiﬁed Algebra Demystiﬁed Anatomy Demystiﬁed Astronomy Demystiﬁed. Business Statistics Demystified. BUSINESS STATISTICS DEMYSTIFIED Demystiﬁed Series Advanced Statistics Demystiﬁed Algebra Demystiﬁed Anatomy. time you open the pages of See You At The Top. The dust jacket is different, and to start with "The End" is certainly d The 5 Second Rule: Transform your Life.
|Language:||English, Spanish, French|
|Distribution:||Free* [*Register to download]|
Now anyone who has mastered basic statistics can easily take the next step up. In Advanced Statistics Demystified, experienced statistics instructor Larry J. blusunihungan.ga: Advanced Statistics Demystified (): Larry Stephens: Books. statistics demystified pdf bibliography for real statistics using excel website. demystified advanced statistics demystified pdf may not make exciting reading, but.
Find out more about OverDrive accounts. Stephens provides an effective, anxiety-soothing, and totally painless way to learn advanced statistics — from inferential statistics, variance analysis, and parametric and nonparametric testing to simple linear regression, correlation, and multiple regression.
With Advanced Statistics Demystified, you master the subject one simple step at a time — at your own speed. This unique self-teaching guide offers exercises at the end of each chapter to pinpoint weaknesses and two question "final exams" to reinforce the entire book.
If you want to build or refresh your understanding of advanced statistics, here's a fast and entertaining self-teaching course that's specially designed to reduce anxiety.
Get ready to: Draw inferences by comparing means, percents, and variances from two different samples Compare more than two means with variance analysis Make accurate interpretations with simple linear regression and correlation Derive inferences, estimations, and predictions with multiple regression models Apply nonparametric tests when the assumptions for the parametric tests are not satisfied Take two "final exams" and grade them yourself!
Simple enough for beginners but challenging enough for advanced students, Advanced Statistics Demystified is your direct route to confident, sophisticated statistical analysis! Mathematics Nonfiction Publication Details Publisher: McGraw-Hill Education Imprint: McGraw-Hill Professional Edition: Stephens Author Larry J. It tends to be more variable because of the small sample. The T distribution is very similar to the Z distribution. It is bell-shaped and centers at zero. The high intensity lamp is expensive but has a lifetime that the company claims to be greater than that of the standard lamp used in automobiles.
The standard has an average lifetime equal to hours. The company wishes to use a small sample to test H0: The company uses a small sample because the lamps are expensive and they are destroyed in the testing process.
Table I-2 gives the lifetimes of 15 randomly selected Table I-2 Lifetimes of high intensity auto lamps. The lifetimes are determined by using the lamps until they expire. The pull-down Stat Basic Statistics 1-sample t is used in Minitab to perform a 1-sample t test. Figure I-7 shows the dialog box. The options dialog box is completed as shown in Fig. Introduction 12 The output is: One-Sample T: It is concluded that the mean lifetime of the new lamps exceeds the standard lifetime of hours.
Almost all surveys meet this sample size requirement so that the standard normal approximation described above is valid in most real-world cases. The percentages of to month old children who consumed the following foods at least once a day were: We wish to test H0: Use the confidence interval method, the classical method, and the p-value method to perform the test. This dialog box gives the sample size, , and the number who answered Yes, In the Options portion of Fig.
The output created by Fig. Classical method: The rejection region is Z 1. The computed test statistic is 1. You reach the same conclusion no matter which of the three methods you use, as will always be the case. I-4 Inferences About a Population Variance or Standard Deviation Suppose a sample of size n is taken from a normal distribution and the sample variance, S2, is computed. A chi-square distribution with 9 degrees of freedom is shown in Fig. I to give an idea of the shape of this distribution.
Introduction 16 Fig. A company claims that its machines fill 1-liter containers of motor oil with a standard deviation of less than 2 milliliters. A sample of 10 containers filled by the machine contained the following amounts: The null hypothesis is H0: The Excel solution is shown in Fig.
This chapter has given a review of the basics of statistical inference concerning a single population parameter. Some of the fundamental ideas involved in statistical thinking have been discussed. Introduction 19 I-5 Using Excel and Minitab to Construct Normal, Student t, Chi-Square, and F Distribution Curves There are four basic continuous distributions corresponding to test statistics that are used in this book for doing statistical inference.
They are the standard normal, the student t, the chi-square, and the F distributions. This means that we must be able to construct the probability density curve for the test statistic. In order to determine when some occurrence is unusual we need to be able to relate that occurrence to some value of the test statistic and compute the p-value related to it.
Suppose we are doing a large sample test concerning a mean. This represents an area under the standard normal distribution. EXAMPLE I Construct a normal curve, using Excel, describing the heights of adult males with a mean equal to 5 ft 11 inches or 71 inches and a standard deviation equal to 2. Construct the curve from 3 standard deviations below the mean to 3 standard deviations above the mean.
Theoretically the curve extends infinitely in both directions. Numbers from A click-and-drag is performed on both columns. Suppose we wished to know the percent of males who are taller than 6 ft 3 inches. This is the percent that are shorter than 75 inches.
Introduction 20 Fig. The answer is found to be Construct a curve to illustrate what you are doing. The pull-down Calc Probability Distributions Normal gives a dialog box which is filled as shown in Fig.
This dialog box will calculate the y values for the standard normal curve and place them in column C2. The probability density is checked. This causes the heights to be computed for the curve. The worksheet is shown in Fig.
I, with the coordinates of Introduction 21 Fig. Introduction 22 Fig. The pull-down Graph Scatterplot produces the graph shown in Fig. This is found as follows. Introduction 23 Plots of the student t, Chi-square, and F distributions are all made in a similar manner using Minitab. First of all construct the x, y coordinates on the curves using the pull-down Calc Probability Distributions normal, t, Chi-square, or F.
After the coordinates on the curve are calculated, the pull-down Graph Scatterplot is used to plot the curves. The graphs in Example I were constructed using this technique. Draw a t-curve illustrating the p-value, and find the p-value. I is the p-value. The pull-down Calc Probability Distributions t gives the dialog box shown in Fig.
We double it and get the p-value to be 0. Introduction 24 Fig. Draw a chi-square curve illustrating the p-value. Calculate the p-value. I, the p-value being the shaded area. First use the chi-square distribution to find the area to the left of Introduction 25 Note: It can be shown that the mean of the chi-square distribution is equal to its degrees of freedom.
I-6 Exercises for Introduction 1. Fifty adult onset diabetics took part in a study. One of the questions asked was the number of planned hours of exercise per week that the diabetic participated in.
The data is shown in Table I After consulting the following Minitab output, answer the questions. Introduction 26 Table I-4 d 6. Hours of planned exercise per week. A sample of size 10 is used to test a hypothesis about a mean. The Minitab t-test for a single mean is as follows: A Minitab analysis of the survey results is as follows: A sample of monthly returns on a portfolio of stocks, bonds, and other investments is given in Table I Introduction 28 Table I-6 After looking over the following Excel output, give your answer.
A large sample test procedure was performed to test H0: Introduction 30 The null hypothesis H0: The statistical test is performed by one of three methods: It is assumed that the sample is taken from a normally distributed population. It is bell-shaped 32 Introduction and centers at zero. Almost all surveys meet this sample size requirement so that the standard normal approximation described above will be valid.
CHAPTER 1 Inferences Based on Two Samples Inferential Statistics Inferential statistics, also called statistical inference, is the process of generalizing from statistics calculated on samples to parameters calculated on populations.
In this chapter, we will be concerned with using two sample means to make inferences about two population means, using two sample proportions to make inferences about two population proportions, and using two sample standard deviations to make inferences about two population standard deviations. In particular, we will calculate the following statistics on samples taken from two populations: The corresponding measures made on the populations are called parameters. Estimation and testing hypothesis are the two types of statistical inference that occur in real-world problems.
The methods for doing this will be illustrated in the next four sections. Independent Samples Purpose of the test: The purpose of the test is to compare the means of two populations when independent samples have been chosen.
The two independent samples are selected from normal populations having equal variances. We suspect that, on the average, males are taller than females. Our research hypothesis is stated as follows: The data are entered for the males in column C1 and for the females in column C2 of the Minitab worksheet Fig.
The variable names are entered at the top of the columns. The pull-down menu Stat Basic Statistics 2-sample t gives Fig. The two sample variances are pooled because the population variances are assumed equal. S2pooled Note: If we obtain an unusually large or small value for the test statistic, then we reject the null hypothesis and accept the research hypothesis.
The computer program calculates the area to the right of 3. This is the p-value for the test. The output shown below is produced by Minitab. Two-sample T for male vs female male female N 10 10 Mean The p-value is 0. Another interpretation of the p-value is in order at this point.
If the null hypothesis is true, that is that males and females are the same height on the average, there is a probability of 0. The pull down Tools Data Analysis produces the data analysis dialog box shown in Fig. The t-Test: Two-Sample Assuming Equal Variances test is chosen. The corresponding dialog box is filled in as shown in Fig. The output shown in Fig. The number on line 11 is the p-value, equal to 0. This small p-value indicates that the null hypothesis of equal means should be rejected and the conclusion reached that on the average men are taller than women.
It lasts from age 18 to 34 and has several indicators: In a study, the hypothesis H0: The age at first marriage for 50 males and 50 females is given in Table The samples were chosen independently of one another.
The data is entered into columns A and B. The computation of the test statistic is shown, followed by the computation of the p-value. Paired Samples Purpose of the test: The purpose of the test is to compare the means of two samples when dependent samples have been chosen.
In fact, suppose we believe the diet will result in more than a 10 pounds weight loss over a six-month period. The 16 overweight individuals are weighed at the beginning of the experiment and again at the six-month period. The onesample t test is performed on Diff. The area to the right of 0. This does not lead us to reject the null hypothesis. One-Sample t: This large p-value shows no evidence for rejecting the null.
The evidence does not cause us to believe that the diet results in more than a 10 pound weight loss. The pull-down Tools Data Analysis gives the dialog box shown in Fig. The Excel dialog box for t-Test: Paired Two Sample for Means is filled as shown in Fig. The output is shown in Fig. The one-tailed p-value is given on row 11 of Fig. Again, it is shown to be equal to 0. The p-value of this test would be reported as 0.
The purpose of the test is to compare two populations with respect to the percent in the populations having a particular characteristic.
The samples are large enough so that the normal approximation to the binomial distribution holds in both populations. That is, if P1 is the percent in population 1 having the characteristic and P2 is the percent in population 2 having the same characteristic, and n1 and n2 are the sample sizes from populations 1 and 2, respectively, then the following are true: EXAMPLE Suppose we are interested in determining whether the percent of female Internet users who have visited a chat room is different from the percent of male Internet users who have visited a chat room.
Two hundred female and two hundred male Internet users are asked if they have visited a chat room. It is found that 67 of the females and 45 of the males have done so. Our research hypothesis is stated as Ha: The test is conducted at a level of significance equal to 0. The test statistic z has a standard normal distribution.
The following pull-down sequence is given. Stat Basic Statistics 2 Proportions. The output is as follows.
The value for test statistic z is found by the software to be 2. Because a two-tailed hypothesis is being tested, the area under the standard normal curve to the right of 2.
This is the p-value. We conclude that a greater percentage of females visit a chat room. The purpose of the test is to determine whether there is equal variability in two populations. It is assumed that independent samples are selected from two populations that are normally distributed. EXAMPLE In order to compare the variability of two kinds of structural steel, an experiment was undertaken in which measurements of the tensile strength of each of twelve pieces of each type of steel were taken.
The units of measurement are pounds per square inch. The research hypothesis Ha: On the basis of this p-value we would reject the null hypothesis and accept the research hypothesis that the variances are unequal. There is additional output that is given as shown in Fig. The upper part of Fig. Clicking OK produces the dialog box shown in Fig. The output that is given is the p-value, which is seen to be 0. Compare this to the value given in the Minitab output in Fig.
Table Average annual consumer spending on auto insurance. The results are shown in Table Test H0: Table Comparison of auto insurance bills for and A research study was conducted to compare two methods of teaching statistics in high school. One method, called the traditional method, presented the course without the use of computer software. The other method, referred to as the experimental method, taught the course and utilized Excel software extensively.
Test the hypothesis that the experimental method produced higher scores on the average. Table Comparison of two methods of teaching statistics using independent samples. Traditional 82 85 82 73 72 82 73 79 71 86 90 98 86 77 81 Experimental 78 83 96 89 82 83 68 84 83 76 83 89 90 85 77 3. In order to compare fertilizer A and fertilizer B, a paired experiment was conducted.
Ten two-acre plots were chosen throughout the region and the two fertilizers were randomly assigned to the plots. That is, it was randomly decided which of the fertilizers was applied to the northern one-acre plot and the other fertilizer was applied to the southern one-acre plot.
The yields of wheat per acre were recorded and are given in Table Answer the following questions in performing your analysis. In a study designed to determine whether taking aspirin reduces the chance of having a heart attack, 11, male physicians took aspirin on a regular basis and 11, male physicians took a placebo on a regular basis.
Analyze the experiment by answering the following questions. Company A 6. One hundred chicks were fed Diet 1 and were fed Diet 2. The summary statistics for a 3-month period were as follows: Diet 1: The null hypothesis is that the population means are equal and the research hypothesis is that the means are not equal. Give the test statistic and the p-value that accompanies it. Everything else remains the same. The blood pressure of 25 patients was determined at the beginning of an experiment.
None of the patients were on blood pressure medicine.
To answer the question, calculate the test statistic and give the p-value. A survey of teenagers was taken and it was determined how many spent 20 or more hours in front of a TV. The results from a survey of males and females were as follows: Give the test statistic for testing Ha: Paired Two Sample for Means. You would need to compute the test statistic and then use Excel to compute the p-value.
Comparing More Than Two Means Designed Experiments We are interested in the relationship between the amount of fertilizer applied and the yield of wheat. We are interested in three levels of fertilizer: There are eighteen similar plots, numbered 1 through 18, available and the plots are located on an experimental farm at Midwestern University. The fertilizer is called a factor and there are three levels of interest for this factor. The plots are numbered as shown in Table The fertilizer levels are applied randomly as follows.
Six random numbers are chosen between 1 and From the remaining twelve numbers, six more are randomly chosen. They are 2, 14, 9, 4, 11, and 6.
A medium amount of fertilizer is applied to plots with these numbers. A high amount is applied to the remaining plots.
L represents a plot with low application of fertilizer, M a plot with medium application, and H a plot with high application. L, M, and H are levels of fertilizer. They are also called treatments. Table shows the random assignment of treatments to experimental units or plots.
The wheat yield is measured for each plot. The wheat yield is called the response variable.
Suppose the mean yield for plots with a low application of fertilizer is The experimental design used here is called a one-way design or a completely randomized design.
It is also called a single factor design with k levels. The randomization tends to protect us against extraneous factors that may have been overlooked. Suppose a fertility gradient runs from left to right because the experimental farm is composed of rolling hills.
Table Randomized complete block design. In this design, the treatments are randomly assigned within the blocks. Within a block, the three treatments will be exposed to the same fertility level. The three treatment means will still be made up of six observations each. The block means will be made up of three observations each see Table Suppose the six block means are equal to The randomized complete block design allows a test to be performed on block means as well as treatment means.
The technique used to analyze the means from a designed experiment is called an analysis of variance table or ANOVA table. Consider next a multi-factor experiment. These three levels of fertilizer, called factor A, and two levels of moisture, referred to as factor B, give six factor-level combinations or treatments, as shown in Table Each of the six treatments can be randomly applied to three plots or experimental units. The experiment is called a 3 by 2 factorial that has been replicated three times.
The six treatments have been applied in a completely randomized fashion see Table A factorial design in three blocks is shown in Table We have introduced only a few of the many experimental designs that are possible and that are studied in a design course. The remaining sections of this chapter will discuss a completely randomized design, a block design, and a factorial design in detail. Block 1 Block 2 Block 3 trt 2 trt 4 trt 5 trt 2 trt 1 trt 5 trt 5 trt 1 trt 1 trt 6 trt 6 trt 2 trt 6 trt3 trt 4 trt 3 trt 3 trt 4 The Completely Randomized Design Purpose of the test: The purpose of the test is to compare the means of several populations when independent samples have been chosen.
This test can be thought of as an extension of the two independent samples t-test of Section Five balls of brand A, five of brand B, and five of brand C are driven by the mechanical device in a random order.
The distance that each travels is measured and the data are shown in Table Can the experimental results allow us to conclude that the mean distances traveled are different for the three brands?
Table Distances traveled by three brands of golf balls. The following output is produced. The total sum of squares is the total variation in the whole data set. This variation is broken into two parts. The factor sum of squares, also known as the between treatments sums of squares, measures the proximity of the to each other. The error or within treatments sums of squares is a pooling of the variation within the treatments.
It is a pooled measure of the variation within the samples. The following relationship can be shown algebraically: The mean square is the SS value divided by the degrees of freedom. The p-value is the area in the upper tail of this distribution beyond The test H0: The p-value corresponding to this computed value of F is 0. The null hypothesis would be rejected at any alpha value greater than 0.
The above discussion is generalized in Table Another data structure that is often encountered in statistical packages is shown in Fig. This form is called stacked. When this form of the completely randomized design analysis is used, multiple comparisons of means may be requested.
This topic will be discussed in Section Some additional graphical output for this pull-down is shown in Figs. Both of these graphics show that the average distance for brands A and B are close. Brand C produces distances 25 to 30 feet longer than brands A and B. Use the pull-down sequence Tools Data Analysis. Single Factor. Fill out the Anova: Single Factor dialog box as shown in Fig. The 0. The calculated F-value exceeds this by a considerable amount. This would allow the classical method as well as the p-value method to be used to perform the test.
The output produced by Excel is the same as that produced by Minitab. Four age groups of Internet users have been polled, with the results shown in Table The response variable is the number of e-mails sent per week. The output for this example is as follows. The mean for each sample is shown by a line within the dot plot. The data in this table is more variable than the data in Table Example The output for this example with the new data is as follows.
Figure is a dot plot of the data in Table Note that the four sample standard deviations are 2. They are small when compared with the sample standard deviations in the second example: This causes the error sums of squares to increase from The patients are randomly divided into three groups.
One group is treated with a diet that is very restrictive, another group is treated with a strict exercise program, and the third serves as a control group. The response variable is the change in diastolic blood pressure after six months of treatment. In this highly unlikely example, there is no variation within the three groups Table Table An experiment with no variation within treatments. Diet Exercise Control 10 13 2 10 13 2 10 13 2 10 13 2 10 13 2 The error term in the analysis of variance has zero sums of squares.
All the variation is between the three treatments. There is no variation within treatments. The error term is zero. In other words, there is no variation within the treatment groups. The purpose of the test is to compare the means of several treatments when they have been administered in blocks.
The treatments have been randomly assigned to experimental units within the blocks. Five golfers of varying ability each hit the three brands in random order. The letters C, B, and A are randomly pulled out of a hat in that order.
Jones will hit brand C, followed by brand B, followed by brand A. Continuing in this manner will ensure the random assignment of treatments within blocks.
The three brands of balls are the treatments and the five golfers are the blocks. The distance that each ball travels is the response variable. Table Statistical layout showing treatments and blocks. The output shown below is given by Minitab. Block 1 2 3 4 5 Mean The p-values for blocks allows us to test the hypothesis H0: Enter the data into the work sheet and execute the pull-down sequence Tools Data Analysis.
Two-Factor Without Replication as given in Fig. Two-Factor Without Replication dialog box as shown in Fig. The output is given in Fig. The Excel output is the same as the Minitab output. The only thing that Excel gives that Minitab does not is the critical F-value for blocks, 3. Factor A is diabetes. None of the twenty were on medication for high blood pressure. The diastolic blood pressure of the twenty participants is measured and the results are given in Table We are interested in the interaction of weight and diabetes Table Normal weight Overweight Non-diabetic 75, 80, 83, 85, 65 85, 80, 90, 95, 88 Diabetic 85, 90, 95, 90, 86 90, 95, , , on the blood pressure.
If there is no significant interaction, then we are interested in the effect of diabetes on blood pressure and in the effect of weight on blood pressure. We refer to this as a 2 by 2-factorial experiment with 5 replicates whose response variable is diastolic blood pressure. The total sums of squares are expressed as a sum of Factor A sums of squares, Factor B sums of squares, interaction sums of squares, and error sums of squares by the following expression: The dialog box is shown in Fig.
The interaction plot is shown in Fig. In Fig. The response for both diabetics and non-diabetics shows an increase in diastolic blood pressure when the weight level changes from normal weight to overweight. The fact that the lines are nearly parallel indicates there is no interaction. The dialog box and interaction plot are shown in Figs.
Means FactorA 1 2 N 10 10 Diastoli FactorB 1 2 N 10 10 Diastoli Similarly, the low level of weight had a mean of The company is faced with the problem of advertising the new camera. One factor deals with what advertising approach to emphasize. The price and the quality of pictures are the two levels of advertising approach the company decides to use.
The other factor of interest is the advertising medium to use. The levels of advertising medium that the company will use are radio, newspaper, and Internet. The response variable is the number of weekly sales. The data are shown in Table We notice first that the interaction is significant.
Thus our objective is to explain the nature of the interaction. In doing this we will discover what the experiment has really found about what, and how sales are affected. Look at the results from all angles.
The interaction plots are given in Figs. Radio sales are relatively low and are the same for both the price and the quality approach. For newspaper advertising sales are higher for the price approach than for the quality approach.
The sales are greater for quality approach than for price approach for Internet advertising. The greatest sales are for Internet advertising where the quality approach is used.
When the quality approach is used, the Internet approach to advertising is the best. First, the data is entered into the worksheet as shown in Fig. Choose Anova: Two-Factor With Replication. Look at Fig. The following Excel output, shown in Figs. The output in Fig. For example, suppose we wanted to investigate the effect of three factors on the amount of dirt removed from a standard load of clothes. The three factors are brand of laundry detergent, A, water temperature, B, and type of detergent, C.
The two levels of brand of detergent are brand X and brand Y. The two levels of water temperature are warm and hot. The factorial design that applies to this experiment is called a 23 factorial design. There are eight treatments possible in a 23 design. They are shown in Table Treatment Detergent Water temp. This would require 16 standard loads of clothes. The expressions for the sums of squares are omitted.
Eight standard loads were randomly assigned to the eight treatments.
This experiment was then replicated so that two observations for each treatment were obtained. The steps to follow when using Minitab are shown in Figs. Table Data for 23 experiment. Detergent Water temp. Figure indicates that no interaction is present, since the lines are nearly parallel in all three graphs.
The means at the low and high levels of the factors are as follows. Means Brand 1 2 N 8 8 Response Temp 1 2 N 8 8 Response That is, Brand Y removes 0. That is, 5. That is, liquid detergent on average removes 4. The brand of detergent X or Y is not as important as the temperature and the type of detergent.
Using a hot temperature and a liquid detergent would be recommended. There are no Excel routines for three or more factors, but there are Minitab routines for any number of factors. The number of experimental units required for experiments with a large number of factors becomes very large. For example, a 24 factorial experiment with two replications requires 32 experimental units. Topics involving large numbers of factors are beyond the scope of this book.
To compare various combinations of means with combinations of other means. Vary, depending on the method or procedure used. In the case of testing several means you are testing H0: To illustrate, suppose an analysis of variance has led to the conclusion that, of four means, not all are equal. Or, we might be interested in the following, for example: Some of these procedures are as follows: Some require equal sample sizes, while some do not. The choice of a multiple comparison procedure used with an ANOVA will depend on the type of experimental design used and the comparisons of interest to the analyst.
One method is the traditional chalk-and-blackboard method, referred to as treatment 1. A second method utilizes Excel weekly in the teaching of algebra and is called treatment 2.
A third method utilizes the software package Maple weekly and is called treatment 3. A fourth method utilizes both Maple and Excel weekly and is called treatment 4. Sixty students are randomly divided into four groups and the experiment is carried out over a one semester time period. The response variable is the score made on a common comprehensive final in the course.
The scores made on the final are shown in Table Fill in the dialog box as shown. Click comparisons. This brings up a new dialog box, shown in Fig. Fill in the One-way Multiple Comparisons dialog box as shown. Method Lower Center Upper 4 0. Treatment 3 5 2 4 1 ——————————————————————— —————————————————— ————————————— There are 10 pairs that are compared.
The results are as follows. Treatment mean 3 is less than treatment mean 4, treatment mean 3 is less than treatment CHAPTER 2 Analysis of Variance mean 1, treatment mean 5 is less than treatment mean 1, treatment mean 2 is less than treatment mean 1. Forty individuals are selected and paid to participate in the experiment. Ten are randomly assigned to each of four groups. The time required by each person in each group is recorded.
The recorded data is the time in hours required to complete the form. Form 1 Form 2 Form 3 Current form 3. Give the dot plot and box plot comparisons of the four means. Refer to Exercise 1 of this chapter. Suppose in exercise 1 of this chapter that a block design was used.
Form 1 Form 2 Form 3 Current Group 1 5. Which form would you recommend that the state choose? Their cumulative GPA was the response recorded. The results of the study are shown in Table Interpret the results of the experiment. A study was undertaken to determine what combination of products maximized the score that a pizza received.
Factor A was cheese and the levels were small and large, factor B was meat and the levels were small and large, and factor C was crust and the levels were thin and thick. The data are given in Table 0 is low and 1 is high. Cheese Meat Crust Rep 1 Rep 2 0 0 0 5.
What is your general recommendation? Table 7. Table Source of variation Treatments 8. Degrees of freedom 5 Blocks — Error 5 Total 15 Sum of squares Mean squares F-statistic p-value — — — — — — — 50 A 22 factorial has been replicated 5 times in a completely randomized design. A 23 factorial has been replicated 3 times in a completely randomized design. Compare all 15 pairs of means a pair at a time. There are n1 elements from population 1, n2 elements from population 2,. They are so named because they allow us to determine the value of the dependent value from the values of the independent variables.
These deterministic models are usually from the natural sciences. Some examples of deterministic models are as follows: Probabilistic models are more realistic for most real-world situations. We know for example that the cost of twenty homes, each of square feet, would likely vary.
The actual costs might be given by the twenty costs in Table This is equivalent to assuming that the mean value of y, E y , equals the deterministic component. Fitting this model to a data set is an example of regression modeling or regression analysis.
The height x is in centimeters and the weight y is in kilograms. The error component is normally distributed with mean equal to 0 and standard deviation 0. Note that the population relationship is usually not known, but we are assuming it is known here to develop the concepts. In fact, we are usually trying to establish the relationship between y and x.
We capture ten of these rodents and determine their heights and weights. This data is given in Table , and a plot is shown in Fig. Height, x Weight, y 1 The actual captured rodents have weights that vary about the line of means. Also note that the taller the rodent, the heavier it is. As mentioned earlier, we do not usually know the equation of the deterministic line that connects y with x.
What we shall see in the next section is that we can sample the population and gather a set of data such as that shown in the height—weight table above and estimate the deterministic equation. The assumptions of regression are: The error terms are assumed to be normally distributed with a mean of zero for each value of x. This means that the errors vary by the same amount for small x as for large x.
The assumptions of regression. The relationship between the number of hours studied, x, and the score, y, made on a mathematics test is postulated to be linear. Ten students are sampled and the scores and hours studied are recorded as in Table A scatter plot Fig. The pull-down is Graph Scatterplot. The scatter plot shows a clear linear trend. The data for x and y are entered into columns C1 and C2 of the Minitab worksheet. Since 0 is outside the range of hours studied, it does not have an interpretation in the context of the scores.
Table compares the observed and predicted values. The values Fig. Assuming the trend continues into , predict the US sales for Suppose we code the years as 1, 2, 3, 4, 5, and 6.
Note that the slope of this line is negative since, as the years increase, the sales are decreasing. It is estimated that as the year increases by one the sales decrease by 11 million. The coded years and sales are entered into columns A and B. Figure gives the output for Excel. The purpose of the test is to determine whether a given value is reasonable for the slope of the population regression line H0: If the null hypothesis is not rejected, then a straight line does not model the relationship between x and y.
The slope of the model tells you how y changes with a unit change in x. The following test statistic is used to test H0: A plot of systolic blood pressure versus weight is shown as a Minitab output in Fig. The test statistic is computed as follows. The data would refute the null hypothesis. Each additional pound would increase the systolic pressure by less than 1.
The T value 3. The random variables X and Y have a bivariate distribution. The following Minitab output results. The plot of the data is shown in Fig. The correlation dialog box is filled in as in Fig. The following output is obtained. The following example illustrates two variables that are negatively correlated. The sample consists of fifteen high school freshers. The plot shows the negative linear relationship between the two variables. The value of r indicates the strength of the relationship.
Table Source d. Also, the source explained variation is also called regression variation and unexplained variation is also called residual variation. The data is shown in Table The coefficient of determination from the Excel worksheet is shown as R Square 0. The interpretation Fig. For example, consider a regression study where y represents systolic blood pressure and x represents weight.
Now suppose we wish to use the regression equation to estimate the expected value of the systolic blood pressure of all individuals who weigh pounds. We would predict that an individual who weighs pounds would have a systolic blood pressure equal to Likewise we would estimate the expected systolic blood pressure of all individuals who weigh pounds to be This additional 1 makes the prediction interval wider.
The independent variable x was the hemoglobin A1C value, taken after three months of taking the fasting blood glucose value each morning of the three-month period and averaging the values.