Tutorial 01: Getting Started With Statistics

Solution 1:

  1.1. There are many dogs and many cats, and they all run at different speeds. Some dogs run faster than some cats, and some cats run faster than some dogs. So there is variability in the data we have. Hence, it is a statistical question.

  1.2. We are comparing a particular dog to a particular wolf. We could put each of them in a weighing machine and come up with an answer. Hence, non-statistical question.

  1.3. There is variation here. In some months it might rain more in Seattle, and other months it might rain more in Singapore. So, there is variability in the data. Hence, a statistical question.

  1.4. It depends on the circumstances. It might be how recent an oil change happened, what the wind conditions are like, what the road conditions are like, exactly how you are driving the car - are you turning or going in a straight line, and so there is variation what the gas mileage is at 40 kmph and 50 kmph. Hence, a statistical question.

  1.5. All English professors don't get paid the same amount, and all Mathematics professors don't get paid the same amount. Some English professors make quite well and some very little, same for Mathematics professors. There is variability in the data. Hence, a statistical question.

  1.6. We here are talking about two particular individuals. We can find out how each of them gets paid in the year 2020. Now, as we have an absolute number, we can reach our conclusion. Here no statistics is required. Hence, a non-statistical question.

Solution 2: How tall are the students in the mathematics class? or What is the average height of the students in the mathematics class?

Solution 3: Think about this on your own!!!

Solution 4: It depends on the sport you pick. With different sports, you will tend to have different data. In football, you might have data like how many goals your team has scored or how many goals your team has allowed or data about assists of different players in your team etc. If you pick cricket, then you will have data regarding your batsmen or bowlers or maybe your fielders. Think about how can you use the data. It's a fun exercise.

Tutorial 02: Basic Terminologies In Statistics

Solution 01: (can be any other example as well)

  Data: The number 10000 is a piece of data, as is the name John Paul.

  Information: If we now say that John Paul is a teacher and \$10000 is a teacher’s salary, the data is given meaning or context and makes more sense to us.

  Knowledge: Building on the information. Knowledge can be John Paul is a teacher and he earns $10000 per year.

Solution 02:

  Information: 5, 10, 15 and 20 are the first four answers in the 5 x table.

  Knowledge: 6, 12, 18 and 24 are the first four answers in the 6 x table. (because the 5 x table starts at five and goes up in fives the 6 x table must start at six and go up in sixes)

Solution 03:

  3.1. Breakfast cereals.

  3.2. Manufacturer, Calories, Sodium, and Fat.

Solution 04: The 85 people.

Solution 05: Sample statistic.

Solution 06:

  Population: All seniors at Riverview High.

  Sample: The 100 seniors surveyed.

Solution 07:

  Population: All classrooms in the elementary school.

  Sample: The 7 classrooms selected.

Solution 08:

  Population: All of the vehicles that pass through the lane with the camera.

  Sample: The group of every tenth vehicle that passes through the lane.

Solution 09:

  Population: All the parents of the paediatrician's patients.

  Sample: The 10 parents of patients selected.

Solution 10.

  Population: The threaded rods produced at the factory that week.

  Sample: The the 40 threaded rods selected.

Solution 11. Sampling is done because you usually cannot gather data from the entire population. Even in relatively small populations, the data may be needed urgently, and including everyone in the population in your data collection may take too long.

Tutorial 03: Descriptive and Inferential Statistics

Solution 1:

  1.1. Here, we want to know how strong the plywood is that we're producing and we determine whether the plywood is strong enough by looking at only out of every 100 sheets. So, we are sampling in this case. And we want to conclude something about the population (which is all the plywood we are producing). Hence, this is an example of inferential statistics.

  1.2. Here, we are not sampling but looking at the entire group and summarizing the data using a bar-graph. Hence, an example of descriptive statistics.

  1.3. The professor is looking at all the students in the course and summarising the data by finding the average. Hence, descriptive statistics.

  1.4. We here are using a sample to conclude something about the population. Hence, inferential statistics.

Solution 02:

S.No Descriptive Statistics Inferential Statistics
1 Concerned with describing the target population. Make inferences from the sample and generalise them to the population.
2 Organise, analyse and present the data in a meaningful manner. Test and predicts future outcomes.
3 Describe the data which is already known. Tries to make conclusions about the population that is beyond the available data.
4 Tools - Measure of Central Tendency and Variability. Tools - Hypothesis Test, Confidence Interval, Regression Analysis.

Tutorial 04: Data-Measurement

Solution 1: First, it helps us decide how to interpret the data from the given variable. Second, it helps us decide what statistical analysis is appropriate on the values that were assigned.

Solution 2:

  2.1. This question is about time measurement with an absolute zero and is therefore ratio-level measurement. (e.g. A person who has been out of the hospital for two weeks has been out twice as long as someone who has been out of the hospital for one week)

  2.2. It yields nominal-data because the patient is asked only to categorise the type of unit he/she was in. This question does not require a hierarchy or ranking of the type of unit.

  2.3. The question results in an ordinal-level data. e.g. very important might be assigned a 4, somewhat important a 3, not very important a 2, and not at all important a 1. Certainly, the higher the number, the more important is the hospital’s location. Thus, these responses can be ranked by selection.

  2.4. ordinal-level, same logic as 2.3.

  2.5. ordinal-level, same logic as 2.3.

Solution 3:

  3.1. Ratio.

  3.2. Ratio.

  3.3. Ordinal.

  3.4. Nominal.

  3.5. Ratio.

  3.6. Ratio.

  3.7. Nominal.

  3.8. Ratio.

  3.9. Ordinal.

Tutorial 05: Frequency Distribution

Solution 1:

Start value: 6.3
End value: 7.7
Range: 1.4000000000000004
Total Number of Classes: 7
Class Width = 0.2

Class Intervals Frequency Class Midpoint Relative Frequency Cumulative Frequency
0 6.3 - under 6.5 1 6.4 0.025 1
1 6.5 - under 6.7 2 6.6 0.050 3
2 6.7 - under 6.9 7 6.8 0.175 10
3 6.9 - under 7.1 10 7.0 0.250 20
4 7.1 - under 7.3 13 7.2 0.325 33
5 7.3 - under 7.5 6 7.4 0.150 39
6 7.5 - under 7.7 1 7.6 0.025 40

Solution 02:

  2.1.

Start value: 10
End value: 85
Range: 75
Total Number of Classes: 5
Class Width = 15.0

Class Intervals Frequency
0 10.0 - under 25.0 9
1 25.0 - under 40.0 13
2 40.0 - under 55.0 11
3 55.0 - under 70.0 9
4 70.0 - under 85.0 8

  2.2.

Start value: 10
End value: 90
Range: 80
Total Number of Classes: 10
Class Width = 8.0

Class Intervals Frequency
0 10.0 - under 18.0 7
1 18.0 - under 26.0 3
2 26.0 - under 34.0 5
3 34.0 - under 42.0 9
4 42.0 - under 50.0 7
5 50.0 - under 58.0 3
6 58.0 - under 66.0 6
7 66.0 - under 74.0 4
8 74.0 - under 82.0 4
9 82.0 - under 90.0 2

  2.3. The ten class frequency distribution gives a more detailed breakdown of temperatures, pointing out the smaller frequencies for the higher temperature intervals. The five class distribution collapses the intervals into broader classes making it appear that there are nearly equal frequencies in each class.

Solution 3:

Start value: 39
End value: 61
Range: 22
Total Number of Classes: 11
Class Width = 2.0

Class Intervals Frequency
0 39.0 - under 41.0 2
1 41.0 - under 43.0 1
2 43.0 - under 45.0 5
3 45.0 - under 47.0 10
4 47.0 - under 49.0 18
5 49.0 - under 51.0 13
6 51.0 - under 53.0 15
7 53.0 - under 55.0 15
8 55.0 - under 57.0 7
9 57.0 - under 59.0 9
10 59.0 - 61.0 5

  The distribution reveals that only 13 of the 100 boxes of raisins contain approx 50 raisin (49 - under 50). However, 71 of the 100 boxes of raisins contains between 45 and 55 raisins. It shows that there are a few boxes (5) that have 9 or more extra raisins (59-60) and two boxex that have 9-11 less raisins (39- under 41) than the boxes are supposed to contain.

Solution 4:

Class Interval Frequency Relative Frequency
0 0 - under 5 6 0.069767
1 5 - under 10 8 0.093023
2 10 - under 15 17 0.197674
3 15 - under 20 23 0.267442
4 20 - under 25 18 0.209302
5 25 - under 30 10 0.116279
6 30 - under 35 4 0.046512

  The relative frequency tells us that it is most probable that a customer is in the 15-20 category (0.267). And, over two thirds (0.67) of the customers are between 10 and 25 years of age.