## required packages/modules

import matplotlib.pyplot as plt
import matplotlib.patches as patches
from matplotlib.path import Path
from matplotlib import rcParams
from IPython.display import display, HTML

## default fontstyle
rcParams["font.family"] = "Ubuntu"

def get_patch(verts):
    codes = [Path.MOVETO] + [Path.CURVE4] * (len(verts) - 1)
    path = Path(verts, codes)
    patch = patches.PathPatch(path, facecolor='none', lw=1.5, edgecolor="#F2F2F2", alpha=0.7)
    
    return patch

## create subplot
fig, ax = plt.subplots(facecolor="#121212", figsize=(12,8))
ax.set_facecolor("#121212")


## props for text
props = dict(facecolor="none", edgecolor="#D3D3D3", boxstyle="round,pad=0.6", zorder=3)

## coordinates to plot line
line_coords = [
    [(5, 9.5), (5, 8.5), (4.5, 8.5), (1, 8)],
    [(5, 9.5), (5, 8.5), (5.5, 8.5), (9, 8)],
    [(1, 6.5), (1, 5.9), (0.5, 5.9), (-2, 5.4)],
    [(1, 6.5), (1, 5.9), (1.5, 5.9), (4, 5.4)],
    [(-2, 4), (-2, 3.5), (-2.5, 3.5), (-4, 3)],
    [(-2, 4), (-2, 3.5), (-2, 3.4), (-2, 3)],
    [(-2, 4), (-2, 3.5), (-1.5, 3.5), (0, 3)],
    [(4, 4), (4, 3.5), (3.5, 3.5), (2, 3)],
    [(4, 4), (4, 3.5), (4, 3.4), (4, 3)],
    [(4, 4), (4, 3.5), (4.5, 3.5), (6, 3)],
    
]

## add lines
for verts in line_coords:
    ax.add_patch(get_patch(verts))

## text coordinates
text_coord = [
    (5, 10.1), (1, 7.23), (9, 7.23),
    (-2, 4.7), (4, 4.7),
    (-4, 2.55), (-2, 2.55), (0, 2.55),
    (2, 2.55), (4, 2.55), (6, 2.55)
]

## text label and size
text_label = [
    ("Types of Statistics", 18), ("Descriptive\nStatistics", 16.5), ("Inferential\nStatistics", 16.5),
    ("Measure of\nCentral Tendency", 14.5), ("Measure of\nVariability", 14.5),
    ("Mean", 13.5), ("Median", 13.5), ("Mode", 13.5),
    ("Variance", 13.5), ("Range", 13.5), ("Dispersion", 13.5)
    
]

## add text
for i in range(len(text_coord)):
    text = ax.text(
        text_coord[i][0], text_coord[i][1], text_label[i][0], color="#F2F2F2", size=text_label[i][1],
        bbox=dict(facecolor="none", edgecolor="#D3D3D3", boxstyle="round,pad=1"), zorder=2,
        ha="center", va="center"
    )

## credit
ax.text(
    10.3, 1.7, "graphic: @slothfulwave612", fontstyle="italic", 
    color="#F2F2F2", size=10, ha="right", va="center", alpha=0.8
)
    
## set axis
ax.set(xlim=(-4.75,10.35), ylim=(1.5,11))

## tidy axis
ax.axis("off")

plt.show()

Descriptive Statistics

Defining The Term

  • In layman's terms, descriptive means seeking to describe.

  • Descriptive Statistics is a term given to the analysis of data that helps describe, show or summarise data in a meaningful way.

  • It helps us to simplify and describe large amounts of data sensibly.

  • We take a group that we're interested in, record data about the group members, and then use summary statistics and graphs to present the group properties.

  • With descriptive statistics, there is no uncertainty because we are describing only the people of items that we measure. We are not trying to infer properties about a larger population.

  • The process involves taking a potentially larger number of data points and reducing them down to a few meaningful summary values and graphs. This procedure allows us to gain more insights and visualize the data.

Example of Descriptive Statistics

  • Example 01:-

    • If we have the results of all the students in a mathematics class, we may be interested in the overall performance of these students.

    • We would also be interested in the distribution or spread of the marks. Descriptive statistics allow us to do this.

  • Example 02:-

    • Consider you want to know how your college football team striker is performing in the current tournament.

    • Your coach provides you with the data on the number of goals the player has scored in each match.

    • You can summarize the whole data in the form of goal-average, another example of using descriptive statistics.

  • Example 03:-

    • Many of the statistical data generated by businesses are descriptive.

    • They might include the number of employees on vacation during June, the average salary at the London office, corporate sales for 2009, average managerial satisfaction score on a company-wide census of employee attitudes, and the average return on investment for the Lofton Company from 1990 to 2008.

Common Tools of Descriptive Statistics

  • Descriptive statistics frequently use the following statistical measures to describe groups:

    • Measure of Central Tendency: Using central tendency techniques (like mean and median) one can locate the centre of the dataset. This measure describes where most values fall.

    • Measure of Variability: Also known as the measure of dispersion, describes how the data is spread in a distribution. We here use the concepts like variances/standard deviation, range etc.

      Note: Do not worry about the jargons here, we will cover them in depth in future blog posts.

Inferential Statistics

Defining The Term

  • In layman's term, inferential means involving conclusions reached based on evidence and reasoning.

  • Often, we do not have access to the whole population we are interested in, but only have a limited number of data instead (known as samples).

  • Inferential statistics are the techniques that allow us to use these samples to make generalizations (forming general concepts or claims) about the population from which the samples were drawn.

Examples

  • Example 01:-

    • Suppose we conducted our study on test scores for a specific class (as in the first example of descriptive statistics section). Now we want to perform an inferential statistics study for that same test. Let’s assume it is a standardized statewide test. By using the same test, but now to draw inferences about a population.

    • In descriptive statistics, we picked the specific class that we wanted to describe and recorded all of the test scores for that class (say, 8th-grade students). For inferential statistics, we need to define the population and then draw a random sample from that population.

    • We need to devise a random sampling plan to help ensure a representative sample. Assume that we draw a random sample of 100 students from the defined population and obtain their test scores.

    • Note that these students will not be in one class, but from many different classes in different schools across the state.

    • When we have our samples ready, we then perform inferential statistic on it and based on the results we make our conclusion about the population.

  • Example 02:-

    • One crucial use of inferential statistics is in pharmaceutical research.

    • Some new drugs are expensive to produce, and therefore tests must be limited to small samples of patients. Utilizing inferential statistics, researchers can design experiments with small randomly selected samples of patients and attempts to reach conclusions, and make inferences about the population.

  • Example 03:-

    • Market researchers use inferential statistics to study the impact of advertising on various market segments.

    • Suppose a soft drink company creates a new advertisement and market researchers want to measure its impact on various age groups.

    • The researcher could stratify the population into age categories ranging from young to old, randomly sample each group, and use inferential statistics to determine the effectiveness of the advertisement for the various age groups in the population.

  • The advantage of using inferential statistics is that they enable the researcher to study effectively a wide range of phenomena without having to conduct a census.

Pros And Cons of Working With Samples

  • We gain tremendous benefits by working with a random sample drawn from a population.

  • In most cases, it is simply impossible to measure the entire population to understand its properties. The alternative is to gather a random sample and then use inferential statistics to analyze the sample data.

  • While samples are much more practical and less expensive to work with, there are tradeoffs.

  • When we estimate the properties of a population from a sample, the sample statistics are unlikely to equal the actual population value exactly.

    • For instance, our sample mean is unlikely to equal the population mean exactly.
  • The difference between the sample statistic and the population value is known as sampling error.

  • Inferential statistics incorporate estimates of this error into the statistical results.

Common Tools of Inferential Statistics

  • Inferential statistics frequently use the following tools:-

    • Hypothesis Tests

    • Confidence Interval

    • Regression Analysis

      Note: I am not going to explain these concepts here because it will confuse the readers. We will follow a step-wise approach by first understanding the basics of descriptive statistics and then moving towards inferential statistics. In that way, it will be more intuitive and clear.

Conclusion

  • For descriptive statistics, we choose a group that we want to describe and then measure all subjects in that group. The statistical summary describes this group with complete certainty.

  • For inferential statistics, we need to define the population and then devise a sampling plan that produces a representative sample. The statistical results incorporate the uncertainty that is inherent in using a sample to understand an entire population.

Questionnaire

Ques 01: Determine which of the following examples represent the case of descriptive statistics or inferential statistics.

  1.1. One out of every 100 sheets of plywood manufactured is tested to see if the plywood is as strong as rated.

  1.2. A bar graph of areas of study of all the currenlty enrolled MIT students is produced.

  1.3. A professor determines the average course grade for the students in a particular course.

  1.4. Recycling habits of 100 Klamath Falls households are observed to determine recyclying capacity needed for the city.

Ques 02. Summarize the concepts of descriptive and inferential statistics.

3. If you face any problem or have any feedback/suggestions feel free to comment.