CHAPTER-21 MEASURES OF CENTRAL TENDENCY-2
INTRODUCTION
Measures of central tendency
are statistical measures that provide information about the center or average
of a data set. They are used to summarize and describe the typical or central
values of a dataset. There are three commonly used measures of central
tendency:
Mean: The mean, often referred to as the average, is calculated
by summing up all the values in a dataset and dividing by the total number of
values. It represents the balance point or the center of the data.
Median:
The median is the middle value in a
dataset when it is arranged in ascending or descending order. If there is an
odd number of values, the median is the middle value. If there is an even
number of values, the median is the average of the two middle values.
Mode: The mode represents the value or values that appear most
frequently in a dataset. In other words, it is the value(s) that has the
highest frequency.
These measures help us
understand the typical value or center around which the data tends to cluster.
They are widely used in various fields, including statistics, data analysis,
and research, to gain insights and make informed decisions based on the
characteristics of the data.
MEDIAN: MEANING AND DEFINTION
The median is a statistical
measure that represents the middle value of a dataset when it is arranged in
ascending or descending order. It is a measure of central tendency, which means
it provides information about the typical or central value within a set of
data.
To calculate the median, you
arrange the data points in order and identify the middle value. If the dataset
has an odd number of observations, the median is the middle value itself. For
example, in the dataset {1, 3, 5, 7, 9}, the median is 5.
If the dataset has an even
number of observations, the median is calculated as the average of the two
middle values. For example, in the dataset {1, 3, 5, 7, 9, 11}, the two middle
values are 5 and 7. The median would be (5 + 7) / 2 = 6.
The median is often used as
an alternative measure of central tendency when the data contains outliers or
is not normally distributed. It is less sensitive to extreme values than the
mean (average) because it is based solely on the position of the data points
rather than their actual values.
For example, consider a
dataset of incomes where most people earn moderate amounts but a few
individuals earn extremely high salaries. In this case, the median income would
provide a better representation of the typical income than the mean, which
would be heavily influenced by the outliers.
In summary, the median is
the middle value of a dataset when it is sorted in order. It is a measure of
central tendency that is often used as an alternative to the mean, particularly
when dealing with skewed or non-normal distributions.
DETERMINATION OF MEDIAN
To determine the
median of a dataset, you can follow these steps:
Sort the dataset in
ascending or descending order, depending on your preference. This step is
essential because the median relies on the arrangement of the data.
If the dataset has an odd
number of observations, the median is the middle value. You can locate the
middle value by finding the observation at the center of the sorted dataset.
For example, in the dataset {4, 2, 7, 1, 5}, after sorting it in ascending
order, you get {1, 2, 4, 5, 7}. The middle value is 4, so the median is 4.
If the dataset has an even
number of observations, the median is the average of the two middle values. To
find these values, locate the two observations at the center of the sorted
dataset. For example, in the dataset {3, 6, 2, 1, 5, 4}, after sorting it in
ascending order, you get {1, 2, 3, 4, 5, 6}. The two middle values are 3 and 4.
The median is then (3 + 4) / 2 = 3.5.
In some cases, you may
encounter grouped or continuous data. In such situations, you need to convert
the data into individual values or use interpolation techniques to estimate the
median.
It's important to note that
the median is not affected by extreme values or outliers since it is based on
the position of the data rather than their actual values. This property makes
it a useful measure when dealing with skewed distributions or data with
significant outliers.
MEDIAN CASE OF INDIVIDUAL SERIES
In the case of an individual
series, determining the median involves arranging the individual values in
ascending or descending order and finding the middle value.
Here's a step-by-step guide
to finding the median of an individual series:
Collect the individual
values of the series.
Sort the values in ascending
or descending order, whichever is more convenient for you.
If the series has an odd
number of values, the median is the middle value. You can identify the middle
value by locating the observation that falls exactly in the middle of the
sorted series. For example, let's consider the series: 7, 3, 2, 9, 5. After
sorting it in ascending order, we get: 2, 3, 5, 7, 9. The middle value is 5, so
the median is 5.
If the series has an even
number of values, the median is the average of the two middle values. In this
case, identify the two observations at the center of the sorted series and
calculate their average. For example, consider the series: 4, 1, 6, 2, 5, 3.
After sorting it in ascending order, we get: 1, 2, 3, 4, 5, 6. The two middle
values are 3 and 4. The median is then (3 + 4) / 2 = 3.5.
Remember that the median is
a measure of central tendency and is not affected by extreme values or
outliers, making it suitable for representing the middle value of a dataset
even in the presence of skewed distributions or extreme observations.
GRAPHICAL LOCATION OF MEDIAN
The graphical location of
the median in a dataset can be represented on various types of graphs, such as
histograms, box plots, or line plots. The specific method depends on the type
of data and the graphical representation being used.
Here are a few
examples of how the median can be represented graphically:
Histogram: A histogram is a bar graph that displays the frequency or
count of data points within specific intervals or bins. The median can be shown
as a vertical line or a dashed line within the histogram, indicating the
position of the median value. It helps visually identify the central tendency
of the data.
Box
Plot: A box plot, also
known as a box-and-whisker plot, provides a visual summary of the distribution
of a dataset. The median is represented by a line or a symbol within the box.
The box itself represents the interquartile range, with the median marking the
center of the box.
Line
Plot: In a line plot or
line graph, which is commonly used to display trends or changes over time, the
median can be represented by a horizontal line across the graph at the corresponding
time point. It helps visualize the central value at different time intervals.
These are just a few
examples, and the graphical representation of the median can vary depending on
the specific visualization technique used. The key idea is to visually indicate
the position of the median within the graph to provide insight into the central
tendency of the dataset.
MERITS AND DEMERITS OF MEDIAN
The median has several
merits and demerits that make it a useful but also limited measure of central
tendency. Here are some of the merits and demerits of the median:
Merits of the Median:
Resistant
to outliers: The median is less
affected by extreme values or outliers in the dataset compared to the mean.
Outliers have minimal influence on the median because it is based on the
position of the data rather than their actual values. This makes the median a
robust measure in the presence of skewed distributions or extreme observations.
Suitable
for non-normal distributions: The median is particularly useful when dealing with
non-normal distributions, as it provides a measure of central tendency that is
less influenced by the shape of the distribution. It can accurately represent
the "typical" value in skewed or asymmetric datasets.
Easy
to interpret: The median has a
straightforward interpretation. It represents the middle value or the value
that divides the dataset into two equal halves. This simplicity makes it easy
to understand and communicate to others.
Demerits of the
Median:
The median only considers
the position of the values in the dataset, ignoring their actual magnitude or
relative differences. This can be a limitation when precise information about
the values is needed, as the median does not provide information about the
distances or relationships between data points.
Limited
mathematical properties: The
median has limited mathematical properties compared to other measures of
central tendency, such as the mean. It cannot be algebraically manipulated or
used in certain statistical calculations as easily as the mean can.
May
not be unique: In some cases, the
median may not be a unique value. If the dataset has repeated middle values,
such as in a bimodal distribution, the median may not provide a single
representative value for the central tendency.
It's important to consider
the specific characteristics of the dataset and the goals of the analysis when
deciding whether the median is an appropriate measure of central tendency. In
some situations, other measures like the mean or mode may be more suitable.
OTHER POSITIONAL (QUARTILES DECILES
PERCENTILES)
In addition to the median,
there are other positional measures that divide a dataset into different parts
based on their position. These measures include quartiles, deciles, and
percentiles. Let's explore each of them:
Quartiles:
Quartiles divide a dataset into four
equal parts. The three quartiles are commonly referred to as Q1, Q2 (which is
the median), and Q3. Q1 represents the value below which 25% of the data falls,
Q2 represents the median (50th percentile), and Q3 represents the value below
which 75% of the data falls. Quartiles are useful for understanding the spread
and distribution of the data.
Deciles: Deciles divide a dataset into ten equal parts. The nine
deciles, labeled D1 to D9, represent the points below which 10%, 20%, ..., 90%
of the data falls, respectively. Deciles provide more granularity in dividing the
data compared to quartiles.
Percentiles: Percentiles divide a dataset into 100 equal parts. The
nth percentile represents the point below which n% of the data falls. For
example, the 75th percentile is the value below which 75% of the data falls.
Percentiles are commonly used to analyze and compare data across different
distributions or populations.
These positional
measures—quartiles, deciles, and percentiles—help understand the distribution
of the data and provide insights into its spread, skewness, and the relative
position of individual values within the dataset. They are particularly useful
when analyzing large datasets or when comparing values across different
datasets or populations.
PARTITION VALUES IN CASE OF DISCRETE
SERIES
When dealing with a discrete
series, partition values refer to dividing the data into different intervals or
categories based on their values. This process helps in summarizing and
analyzing the data more effectively. The specific method for partitioning
values in a discrete series can vary depending on the nature of the data and
the specific objectives of the analysis. Here are a few common approaches:
Equal-width
intervals: In this method, you
divide the range of values into equal-width intervals. For example, if you have
data ranging from 1 to 100 and want to create 5 intervals, each interval would
span a width of 20 units (e.g., 1-20, 21-40, 41-60, 61-80, 81-100). This method
is useful when you want to create evenly distributed intervals, but it may not
account for the density of data within each interval.
Equal-frequency
intervals: In this method, you
aim to divide the data into intervals with an equal number of observations. To
achieve this, you sort the data in ascending or descending order and divide it
into equal groups. For example, if you have 100 data points and want to create
5 intervals, each interval would contain 20 data points. The values within each
interval may not have the same width, but they will have a similar number of
observations.
Custom
intervals: In certain cases, you
may want to create intervals based on specific criteria or requirements. For
example, you might want to create intervals that correspond to specific
categories or ranges of interest. This approach allows for more flexibility in
partitioning the data based on your analysis goals.
When partitioning values in
a discrete series, it's important to consider the nature of the data, the distribution
of values, and the specific objectives of the analysis. The chosen partitioning
method should facilitate a meaningful representation of the data and help
uncover patterns or insights effectively.
TYPICAL PROBLEMS OF PARTITION VALUES
Partitioning values in a
dataset can sometimes present challenges or problems that need to be
considered. Here are a few typical issues that can arise when dealing with
partition values:
Determining
the optimal number of intervals: Choosing the appropriate number of intervals for
partitioning values can be subjective and dependent on the specific dataset and
analysis objectives. Selecting too few intervals may oversimplify the data,
while selecting too many intervals can lead to excessive detail and
difficulties in interpretation. Finding the right balance is essential.
Handling
unevenly distributed data: If
the dataset has unevenly distributed values, such as a skewed or heavily
concentrated distribution, equal-width or equal-frequency partitioning methods
may not capture the data's characteristics effectively. In such cases,
alternative techniques, such as logarithmic scales or non-uniform intervals,
may be more appropriate.
Addressing
outliers: Outliers can have a
significant impact on partitioning values, particularly when using equal-width
or equal-frequency methods. Outliers can cause the intervals to be excessively
wide or narrow, potentially distorting the overall representation of the data.
Robust techniques, such as using percentiles or trimming outliers, may be employed
to mitigate this issue.
Determining
meaningful intervals: Creating
intervals that are meaningful and provide useful insights can be challenging.
It requires consideration of the data's context and subject matter expertise.
Choosing intervals that align with relevant categories or thresholds specific
to the data domain can enhance the interpretability and practicality of the partitioning.
Maintaining
consistency and comparability: When working with multiple datasets or conducting
comparative analysis, it is important to ensure consistency in partitioning
values. If different datasets are partitioned using different methods or
intervals, it can hinder meaningful comparisons and compromise the validity of
the analysis.
To overcome these problems,
it is crucial to carefully consider the nature of the data, explore alternative
partitioning approaches, and tailor the partitioning method to the specific
dataset and analysis goals. Flexibility, adaptability, and domain knowledge are
key to addressing the challenges that arise when partitioning values.
GRAPHICAL LOCATION OF QUARTILES,
DECILES AND PERCENTILES
Graphically representing
quartiles, deciles, and percentiles can provide visual insights into the
distribution of data and the relative position of specific values within a
dataset. Here are some common graphical methods for indicating these positional
measures:
Box
Plot: Box plots, also known
as box-and-whisker plots, are widely used to display quartiles, as well as
other statistical properties of a dataset. In a box plot, the box represents
the interquartile range (IQR), which spans from the first quartile (Q1) to the
third quartile (Q3). The median (Q2) is typically represented as a line within
the box. The whiskers extend from the box to the minimum and maximum values, or
they can be defined based on certain criteria. The box plot provides a visual
summary of the quartiles and helps identify the spread and skewness of the
data.
Percentile
Plot: A percentile plot is a graph
that displays the cumulative distribution of the data. The x-axis represents
the percentile values, ranging from 0 to 100, while the y-axis represents the
corresponding values from the dataset. By plotting the data points against
their percentiles, you can observe the distribution and identify specific
percentiles of interest. This type of plot helps assess the relative position
of values within the dataset.
Cumulative
Frequency Curve: A
cumulative frequency curve, also known as an ogive, displays the cumulative
frequency or proportion of values up to a certain point. It allows you to
visualize the distribution of data and locate specific percentiles or
positional measures. By plotting the cumulative frequency on the y-axis and the
corresponding values or percentiles on the x-axis, you can assess the position
of quartiles, deciles, or other percentiles within the dataset.
MEANING AND DEFINITION
The terms
"meaning" and "definition" are closely related and are used
to describe the understanding and explanation of a word, concept, or idea. Here's
a brief explanation of each term:
Meaning: The meaning of a word, concept, or idea refers to the
understanding or interpretation associated with it. It encompasses the sense or
significance conveyed by the word or the concept it represents. The meaning can
be derived from various sources, such as language, context, culture, and
personal experiences. It represents the essence or understanding of what something
represents or signifies.
Definition: A definition provides a formal explanation or description
of a word, concept, or idea. It aims to clarify and establish the meaning of
the term in a specific context. Definitions often consist of a statement or set
of statements that specify the essential characteristics, properties, or
criteria that define and distinguish the term from other related terms.
Definitions can be found in dictionaries, textbooks, academic literature, or
other authoritative sources.
In summary, the meaning
refers to the understanding or interpretation associated with a word or concept,
while the definition provides a formal explanation or description of that word
or concept. The meaning represents the broader understanding, while the
definition offers a more precise and specific explanation within a given
context.
CALCULATION OF MODE IN CASE OF
INDIVIDUAL SERIES
To calculate the mode in the
case of an individual series, you need to determine the value or values that
occur most frequently in the dataset. The mode represents the observation(s)
with the highest frequency.
Here's a step-by-step guide
to calculating the mode in an individual series:
Collect the individual
values of the series.
Count the frequency of each
value in the dataset. A frequency refers to the number of times a specific
value occurs in the series.
Identify the value(s) with
the highest frequency. These value(s) will be the mode(s) of the individual
series. If there is a single value that occurs most frequently, it is called a
unimodal distribution. If there are multiple values with the same highest
frequency, it is called a multimodal distribution. In some cases, a dataset may
have no mode if all values occur with equal frequency.
It's worth noting that an
individual series can have no mode (no value occurring more frequently than
others), one mode, or multiple modes. The mode is useful for identifying the
most common or typical value(s) in a dataset and can be particularly helpful
when dealing with categorical or discrete data.
If you encounter ties (i.e.,
multiple values with the same highest frequency) or need to handle continuous
data, additional techniques like finding the modal class in a frequency
distribution or using statistical software may be necessary to determine the
mode accurately.
GRAPHICAL LOCATION OF MODE
The graphical representation
of the mode in a dataset depends on the type of data and the chosen
visualization method. Here are a few common graphical approaches to
representing the mode:
Bar
Chart: A bar chart, also
known as a bar graph, is a common visualization method for categorical or
discrete data. In a bar chart, each category or value is represented by a
separate bar, with the height of the bar corresponding to the frequency or
count of that value. The mode(s) can be identified by observing the highest
bar(s) on the chart, as they represent the values with the highest frequency.
Histogram: A histogram is a graphical representation of the
frequency distribution of continuous or discrete data. It consists of a series
of adjacent rectangles or bins, with the width of each bin representing a range
of values and the height representing the frequency or count of values falling
within that range. The mode(s) can be identified as the bin(s) with the highest
bar(s) on the histogram.
Line
Plot: In a line plot or
line graph, which is commonly used to represent trends or changes over time,
the mode(s) can be identified as the peak(s) on the graph. By plotting the
values on the x-axis and the corresponding frequencies on the y-axis, the
mode(s) will be represented as the highest point(s) on the line plot.
It's important to note that
the mode is most commonly used for categorical or discrete data. For continuous
data, identifying the mode from a graph can be less straightforward, as there
may not be a single distinct peak. In such cases, additional techniques like
kernel density estimation or using statistical software may be employed to
estimate the mode.
The choice of graphical
representation depends on the nature of the data and the visual presentation
that effectively conveys the mode(s). The objective is to identify the value(s)
with the highest frequency in the dataset, which represents the mode(s).
EMPIRICAL RELATION BETWEEN MEAN MEDIAN
AND MODE
The mean, median, and mode
are three measures of central tendency used to describe the distribution of a
dataset. While they all provide information about the center of the data, they
can have different relationships depending on the shape and characteristics of
the distribution. Here are some common empirical relationships between the
mean, median, and mode:
Symmetric
Distribution: In a symmetric
distribution, where the data is evenly distributed around the center, the mean,
median, and mode tend to be approximately equal. For example, in a perfectly
symmetrical normal distribution, the mean, median, and mode will all be equal.
Skewed
Distribution: In a skewed
distribution, where the data is concentrated more towards one tail of the
distribution, the mean, median, and mode can differ. In a positively skewed
distribution (tail to the right), the mode is usually the smallest value,
followed by the median, and then the mean, which tends to be the largest value.
In a negatively skewed distribution (tail to the left), the mode is the largest
value, followed by the median, and then the mean, which tends to be the
smallest value.
Bimodal
or Multimodal Distribution: In
a distribution with multiple modes (bimodal or multimodal), there can be more
than one mode. The mean and median may not accurately represent the center of
the data in such cases, as they may fall between the modes or in areas with low
frequencies.
It's important to note that
these empirical relationships are not absolute and can vary based on the
specific dataset and distribution. The mean, median, and mode each provide
different insights into the central tendency of the data and should be
considered together to gain a more complete understanding of the distribution.
Additionally, there are various types of distributions and scenarios where the
relationships between these measures may differ from the typical patterns
described above.
MERITS AND DEMERITS OF MODE
The mode, as a measure of
central tendency, has its merits and demerits. Let's explore them:
Merits of the Mode:
Simple
Interpretation: The
mode is easy to understand and interpret. It represents the value or values
that occur most frequently in a dataset, making it straightforward to communicate
and explain to others.
Suitable
for Categorical Data: The
mode is particularly useful for categorical or qualitative data, where values
are grouped into distinct categories or classes. It provides a clear
representation of the most common category or class in the dataset.
Resistant
to Outliers: The mode is not
affected by outliers, as it only considers the value(s) with the highest
frequency. This can be an advantage when dealing with skewed or asymmetrical
distributions that may have extreme values.
Applicable to Non-Numeric
Data: Unlike the mean and median, which require numeric values, the mode can be
calculated for non-numeric data, such as categorical variables or qualitative
responses.
Demerits
of the Mode:
Not Unique or Not Exist:
Unlike the mean and median, which are unique measures, the mode can have
multiple values (multimodal) or no mode at all if all values occur with equal
frequency. This lack of uniqueness can limit its interpretability and hinder
the characterization of the central tendency of the data.
Ignores
Numerical Relationships: The
mode disregards the numerical relationships between values within a dataset. It
only considers the frequency of occurrence, which means it may not capture
important information about the magnitude or order of the values.
Limited
Use with Continuous Data: The
mode is less suitable for continuous or interval-level data, where values can
have infinite decimal places or vary along a continuous scale. In such cases,
the mode may not accurately represent the central tendency, and other measures
like the median or mean are often preferred.
Insensitive
to Small Frequency Variations: The
mode is sensitive to changes in the frequency of the mode(s) but may not
capture small variations in the distribution of other values. This can limit
its ability to provide a comprehensive understanding of the dataset.
It's important to consider
the specific characteristics of the data, the research question, and the goals
of analysis when deciding whether to use the mode as a measure of central
tendency. In many cases, it is used in conjunction with other measures, such as
the mean or median, to provide a more comprehensive description of the data.
COMBINED ILLUSTRATION ON MEAN, MEDIAN
AND MODE
Suppose we have a dataset
representing the ages of students in a class:
15, 16, 16, 17, 18, 18, 18,
19, 20, 21
Mean:
To calculate the mean, we
sum up all the values in the dataset and divide by the total number of values:
(15 + 16 + 16 + 17 + 18 + 18
+ 18 + 19 + 20 + 21) / 10 = 178 / 10 = 17.8
So, the mean age of the students
in this class is 17.8.
Median:
To find the median, we
arrange the values in ascending order and select the middle value. If the
dataset has an even number of values, we take the average of the two middle
values.
Arranging the values in
ascending order:
15, 16, 16, 17, 18, 18, 18,
19, 20, 21
Since we have an odd number
of values (10), the median is the middle value, which is 18.
So, the median age of the
students in this class is 18.
Mode:
The mode represents the
value(s) that occur most frequently in the dataset.
In this example, the value
18 occurs three times, which is more frequently than any other value. Hence,
the mode of the dataset is 18.
So, the mode age of the
students in this class is 18.
To summarize:
Mean: 17.8
Median: 18
Mode: 18
In this case, the mean and
median are close to each other, indicating a relatively symmetrical
distribution. The mode represents the most frequently occurring value, which in
this case is also 18.
Please note that this is a
simplified example, and in real-world scenarios, the relationship between mean,
median, and mode can vary depending on the dataset and its distribution.
SELECTION OF SUITABLE AVERAGE OR WHICH
IS THE BEST AVERAGE
The selection of a suitable
average, or the "best" average, depends on the specific context,
characteristics of the data, and the objective of analysis. There are different
measures of central tendency available, including the mean, median, and mode,
each with its own strengths and weaknesses. Here are some considerations to
help you choose the most appropriate average:
Mean: The mean is commonly used and suitable for symmetric
distributions without extreme outliers. It considers all the values in the
dataset and provides a balance between high and low values. The mean can be
influenced by extreme values and may not be representative if the data is
skewed or contains outliers.
Median:
The median is appropriate when dealing
with skewed distributions or data containing outliers. It is less sensitive to
extreme values and provides a better representation of the central value in
such cases. The median is useful when the order or rank of values is important.
Mode: The mode is beneficial for categorical or discrete data
and can be used alongside other averages. It represents the most frequently
occurring value(s) in the dataset and is easy to interpret. The mode is less
suitable for continuous data or when a unique central value is desired.
The choice of average
depends on the nature of the data and the research question. In some cases,
using multiple measures of central tendency can provide a more comprehensive
understanding of the dataset. For example, when examining income distribution, the
mean can provide an average income, while the median can give insight into the
typical income of the population.
Additionally, it's important
to consider other factors such as the distribution shape, data quality,
outliers, and the level of measurement (nominal, ordinal, interval, or ratio)
when selecting the appropriate average.
Ultimately, there is no
universally "best" average. The selection should be based on careful
consideration of the data characteristics, the research objective, and the
insights you want to derive from the analysis.
USES OF DIFFERENT AVERAGES OR
COMPARATIVE ANAL YSIS OF VARIOUS AVERAGES
Different averages, such as
the mean, median, and mode, have distinct uses and applications. Here's a
comparative analysis of the various averages and their respective uses:
Mean:
Used for symmetric
distributions without extreme outliers.
Provides a balance between
high and low values.
Frequently used in
statistical analysis, such as calculating the average of a continuous variable
or determining the center of a distribution.
Suitable for situations
where the goal is to understand the overall average value or to calculate
weighted averages.
Median:
Used for skewed
distributions or data containing outliers.
Less sensitive to extreme
values and provides a better representation of the central value.
Useful when the order or
rank of values is important, such as in income distribution analysis, where the
median income represents the middle value of the population.
Preferred when dealing with
ordinal data or when outliers can significantly affect the mean.
Mode:
Used for categorical or
discrete data.
Represents the most
frequently occurring value(s) in the dataset.
Provides insights into the
dominant category or class within a dataset.
Helpful for identifying the
most common response or the most prevalent category in a survey or
questionnaire.
Comparative Analysis:
Mean is influenced by
extreme values, while median and mode are resistant to outliers.
Median is well-suited for
skewed distributions, while mean can be biased by extreme values.
Mode is suitable for
categorical or discrete data, while mean and median are applicable to
continuous data.
The choice between mean,
median, and mode depends on the specific data characteristics, the research
question, and the goal of analysis.
Using multiple averages can
provide a more comprehensive understanding of the dataset. For example, mean
and median can be compared to assess the skewness of a distribution, while mode
can highlight the most common category.
In summary, the selection of
the appropriate average depends on the type of data, the distribution shape,
the presence of outliers, and the research objective. Comparative analysis
helps identify the strengths and limitations of each average and aids in
choosing the most suitable measure of central tendency for a particular
analysis.
LIMITATIONS OF AVERAGES
Averages, such as the mean,
median, and mode, have certain limitations that should be considered when
interpreting and using them. Here are some common limitations of averages:
Sensitivity
to Outliers: Averages,
particularly the mean, are sensitive to extreme values or outliers in the
dataset. Outliers can significantly impact the calculated average, pulling it
towards their extreme value and potentially distorting the overall
representation of the data.
Lack
of Representation: Averages
may not always accurately represent the entire dataset or provide a complete
picture of the distribution. For example, the mean may not reflect the typical
value if the data is skewed or has a non-normal distribution. In such cases,
the median may be a better measure of central tendency
Inability
to Capture Variability: Averages
do not provide information about the variability or spread of data points. They
summarize the central tendency but may not reveal important details about the
distribution, such as the range, variance, or standard deviation.
Distortion
by Skewed Distributions: Skewed
distributions, where the data is asymmetrically distributed towards one end,
can affect the interpretation of averages. The mean can be heavily influenced
by the skewed tail, while the median may provide a more representative measure
in such cases.
Unsuitability
for Categorical Data: Averages
are primarily designed for numeric data and may not be applicable to
categorical or qualitative variables. The mode is often used for categorical
data, but it may not provide a meaningful measure of central tendency for
continuous or interval-level variables.
VERY SHORT QUESTIONS
ANSWER
Q.1. What is median or define median?
Ans. Central.
Q.2.Which are different positional
averages?
Ans. Quartiles, Deciles, Percentiles.
Q.3. What is a quartile or in how many
parts do quartiles divide a series into?
Ans. Quartiles divide a series into four equal parts.
Q.4. Write formula for the calculation
of median in individual series?
Ans. In an individual series, the formula for calculating the
median is:
Median = ((n + 1) / 2)th
term
or
Median = (n / 2)th term +
((n / 2) + 1)th term / 2
Q.5. Write formula for the calculation
of median in continuous series?
Ans. Median = L + ((n/2 - F) / f) * h, where L is the lower
boundary of the median class, n is the total number of observations, F is the
cumulative frequency of the class preceding the median class, f is the
frequency of the median class, and h is the class width.
Q.6. Define mode?
Ans. Most frequent value.
Q.7. Write formula for the calculation
of mode in continuous series?
Ans. In a continuous series, there is no specific formula for
calculating the mode. The mode is determined by identifying the class with the
highest frequency (modal class) and then estimating the mode within that class
based on the shape of the distribution and other available data.
Q.8.What is the formula for calculating
mode in a bi-modal series?
Ans. There is no specific formula for calculating the mode in
a bi-modal series. The mode is determined by identifying the classes with the
highest frequencies and considering the values within those classes as the
modes.
Q.9.Which measure of central tendency
is considered to be the most suitable representative of the series?
Ans. It depends on the distribution and characteristics of the
series; the most suitable measure of central tendency can vary.
Q.10. Write any two limitations of
measures of central tendency or averages?
Ans. Averages
can be influenced by outliers, leading to distortion in their representation of
the central tendency.
Averages may not provide a
complete picture of the data as they do not capture the variability or spread
of the values.
Q.11. Name any one positional measure?
Ans. Quartiles.
Q.12. Median divides the series
into……parts?
Ans. Median divides the
series into two equal parts.
Q.13. Quartile divides the series
into……parts?
Ans. Quartiles divide the series into four equal parts.
Q.14. Deciles divide the series
into….parts?
Ans. Deciles divide the series into ten equal parts.
SHORT QUESTIONS ANSWER
Q.1. Write ant two demerits of median?
Ans. Insensitivity to extreme values: The median is less affected by extreme values or
outliers, but this can also be a disadvantage. It may not accurately reflect
the impact of extreme values on the overall data, as it only considers the
middle value(s) and ignores the specific values themselves.
Limited
information about the distribution: The median provides information about the central
position of the data but does not convey details about the shape, spread, or
variability of the distribution. It does not capture information about the data
points above and below the median, potentially resulting in a loss of
information about the overall pattern of the dataset.
Q.2. Describe the merits of median?
Ans. The
median has several merits that make it a valuable measure of central tendency
in certain situations:
Robustness
to outliers: The median is less
affected by extreme values or outliers compared to the mean. It provides a more
robust representation of the central value when dealing with skewed
distributions or datasets with significant outliers.
Suitable
for ordinal or ranked data: The
median is particularly useful when working with ordinal data, where the order
or ranking of values is important. It accurately represents the middle value,
making it appropriate for datasets where the relative position of values
matters more than their numerical differences.
Applicable
to skewed distributions: The
median can effectively capture the central tendency in distributions that are
heavily skewed or have long tails. It is not influenced by extreme values in
the same way as the mean, making it a better choice when data does not follow a
symmetrical pattern.
Ease
of interpretation: The
median is straightforward to interpret. It represents the value that separates
the lower half of the dataset from the upper half, providing a clear indication
of the central position. This makes it accessible and easily understandable for
non-technical audiences.
Useful
for non-normal distributions: The median is a reliable measure when working with
non-normal distributions or when the underlying distribution is unknown. It can
still provide meaningful insights into the central value, even when assumptions
about the data's distribution cannot be made.
Overall, the merits of the
median make it a valuable measure in situations where the data has outliers, is
skewed, or when ordinal data is involved. Its robustness and interpretability
contribute to its widespread application in various fields, including finance,
economics, and social sciences.
Q.3. Describe demerits of median?
Ans. While the median has its merits, it also has some limitations
or demerits to consider:
Insensitivity
to individual values: The
median treats all values within its respective group equally. It does not
consider the actual values of individual observations, which may result in a
loss of information. In cases where specific data points are important or
require consideration, the median may not provide detailed insights.
Potential
loss of precision: The
median only considers the middle value(s) and ignores the other data points.
This can lead to a loss of precision or reduced variability in the data
representation. The median may not adequately capture the full range or spread
of the dataset.
Limited
applicability to interval or ratio data: While the median is suitable for ordinal data, it may not
be the best measure of central tendency for interval or ratio data. In such
cases, the mean may provide a more accurate representation of the average or
central value.
Difficulty
with averaging or aggregating: Since the median only represents the middle value(s), it
may pose challenges when trying to calculate an average or aggregate multiple
medians. Combining medians of different groups or datasets may not accurately
reflect the overall central tendency.
Inefficiency
for large datasets: Calculating
the median can be computationally intensive for large datasets, especially
compared to simpler measures like the mean. This can make the calculation
time-consuming and impractical in certain scenarios.
Q.4.How median is calculated
graphically?
Ans. The median can be calculated graphically
by constructing a cumulative frequency curve or an ogive. The median is the
value that corresponds to the point on the ogive where the cumulative frequency
is equal to half of the total frequency.
To calculate the
median graphically:
Plot the cumulative
frequency distribution on a graph, with the cumulative frequency on the y-axis
and the values on the x-axis.
Connect the plotted points
to form a cumulative frequency curve (ogive).
Locate the point on the
ogive where the cumulative frequency is equal to half of the total frequency.
The corresponding value on the x-axis is the median.
Q5. What is mean by median Explain its
merits and demerits?
Ans. Median is a measure of central tendency that represents
the middle value in a dataset when arranged in ascending or descending order.
Merits of Median:
Robustness: The median is less influenced by extreme values or
outliers, making it a robust measure of central tendency in skewed
distributions or when dealing with outliers.
Suitable
for skewed data: The median accurately
represents the center of a skewed distribution, providing a more representative
measure than the mean.
Applicable
to ordinal data: The
median is suitable for ranked or ordinal data, where the order of values is
important but their precise numerical differences may not be.
Demerits of Median:
Loss
of precision: The median ignores
the actual values of individual observations, resulting in a loss of
information about the dataset's variability and spread.
Limited
applicability to interval/ratio data: The median may not be the best choice for interval or
ratio data, as it does not consider the numerical differences between values.
Difficulty
in averaging: Combining medians of
different groups or datasets may not accurately reflect the overall central tendency,
making it challenging to calculate an aggregate median.
Overall, the median's merits
lie in its robustness, suitability for skewed data, and ordinal variables.
However, its demerits include potential loss of precision, limitations with
interval/ratio data, and difficulties in averaging or aggregating.
Q.6. Define mode? How would you justify
that it is a positional average?
Ans. Mode
refers to the value or values that occur most frequently in a dataset. It
represents the peak or highest point(s) of the distribution, indicating the most
commonly occurring value(s).
The mode can be justified as
a positional average because it represents the position or location in the
dataset where the highest concentration of values is observed. It identifies
the most frequent value(s) and provides insight into the central tendency based
on the frequency of occurrence. By identifying the mode, we can understand the
position(s) in the data where the observations cluster the most, making it a
positional average.
Unlike the mean or median,
which rely on mathematical calculations or positional order, the mode focuses
on the frequency or count of values in a dataset. It indicates the position(s)
that have the highest density or concentration of observations, reflecting a positional
characteristic of the data. Therefore, the mode can be considered a positional
average that highlights the most commonly occurring value(s) and their relative
position within the dataset.
LONG QUESTIONS ANSWER
Q.1. Define an average and various types
of averages Explain concrete cases in which mode specifically used?
Ans. An average is a measure used to represent the central
tendency of a dataset. It provides a typical value that summarizes the data.
There are different types of averages, including the mean, median, and mode.
Mean: The mean is calculated by summing up all the values in a
dataset and dividing by the total number of values. It is commonly used when
the data is numerical and has a symmetrical distribution. For example, the mean
is often used to calculate the average score of students in a class or to
determine the average temperature for a given period.
Median: The median is the middle value in a dataset when arranged
in ascending or descending order. It is suitable for skewed distributions or
datasets with outliers. For instance, the median income is often used to
understand the typical earning level in a population, as it is less affected by
extreme values.
Mode: The mode represents the value or values that occur most
frequently in a dataset. It is useful when identifying the most common or
popular category or value. The mode can be specifically used in various cases,
such as:
Categorical
data: When dealing with
categorical variables, such as favorite colors or preferred brands, the mode
helps identify the most frequently chosen category.
In surveys or questionnaires, the mode can
reveal the most common response or option selected by respondents.
Business
decisions: The mode can be used
in market research to determine the most popular product or service among
consumers.
Quality
control: In manufacturing, the
mode can be used to identify the most frequent defect or issue occurring in a
production process.
Overall, the mode is
particularly useful in situations where identifying the most common or
frequently occurring value is of interest, such as in categorical data
analysis, market research, and quality control.
Q.2.What is central tendency state
giving illustrations the circumstances where mode may be more suitable of
tendency tan the arithmetic mean?
Ans. Central tendency refers to the measure that represents
the central or typical value of a dataset. While the arithmetic mean is
commonly used as a measure of central tendency, there are situations where the
mode may be more suitable. Here are some illustrations of circumstances where
the mode is preferred over the arithmetic mean:
Categorical
data: When dealing with
categorical variables, such as favorite colors or preferred modes of
transportation, the mode is more appropriate. For example, in a survey asking
people to select their favorite color, the mode would provide the most commonly
chosen color, which is a meaningful representation of central tendency.
Skewed
distributions: In distributions that
are highly skewed or have outliers, the mode can be a better measure of central
tendency than the arithmetic mean. Consider a dataset representing household
income in a country where there is a significant income disparity. The
arithmetic mean can be heavily influenced by a few extremely high-income
households, whereas the mode would indicate the most frequently occurring
income category, which could better represent the central tendency for the
majority of households.
Nominal
or ordinal data: In
some cases, the data may not have a numerical scale, but instead have
categories or ranks. For example, if you have data on the ranks of different
sports teams, the mode would indicate the team with the most frequent rank,
which can be a meaningful representation of central tendency within the context
of team performance.
Bi-modal
distributions: In distributions that
have two distinct peaks or modes, the mode can provide valuable insights. For
instance, in a dataset representing the age distribution of students in a
university, there may be two prominent peaks corresponding to undergraduate and
graduate students. The mode(s) in this case would capture the presence of these
distinct groups, offering a suitable measure of central tendency.
These illustrations
highlight situations where the mode can be more appropriate than the arithmetic
mean for capturing the central tendency, especially when dealing with
categorical or ordinal data, skewed distributions, or bi-modal distributions.
It is important to choose the measure of central tendency that best aligns with
the characteristics and nature of the data being analyzed.
Q.3. Define mode? Compare its merits
and demerits with those of arithmetic mean?
Ans. Mode is a measure of central tendency that represents the
value or values that occur most frequently in a dataset.
Merits of Mode:
Suitable
for categorical data: The
mode is particularly useful when dealing with categorical variables or data
with distinct categories. It provides insight into the most commonly occurring
category, making it an appropriate measure for such data.
Simple
interpretation: The
mode is easy to understand and interpret, as it directly identifies the most
frequent value(s) in the dataset. It can provide a clear representation of the
central tendency in situations where the most common occurrence is of interest.
Robustness
to outliers: The mode is not
influenced by extreme values or outliers, making it a robust measure of central
tendency in the presence of skewed distributions or unusual observations.
Applicable
to all types of data: The
mode can be calculated for any type of data, including numerical, categorical,
or ordinal data.
Demerits of Mode:
Lack
of precision: The mode only
identifies the most frequent value(s) and does not consider the other data
points. It may not provide a precise measure of the central tendency or reflect
the full range or spread of the dataset.
Limited
use for numerical data: The
mode may not be appropriate for datasets with numerical values, especially
those with continuous or interval data. It does not take into account the
specific numerical differences between values.
Potential
ambiguity in multimodal distributions: If a dataset has multiple modes or peaks, the mode may
not be well-defined or may not accurately represent the central tendency. It
may not capture the overall pattern of the data, especially in cases where
there are multiple modes with similar frequencies.
Arithmetic Mean
(Merits and Demerits):
The arithmetic mean is
another measure of central tendency that calculates the average of all the
values in a dataset.
Merits of Arithmetic
Mean:
Utilizes
all data points: The
arithmetic mean considers every value in the dataset, taking into account the
magnitude and numerical differences between them.
Provides
a precise measure: The
mean provides a more precise measure of the central tendency, considering the
sum of all values and their relative weights.
Suitable
for interval/ratio data: The
arithmetic mean is well-suited for datasets with interval or ratio data, as it
incorporates the numerical values and their relationships.
Demerits of Arithmetic
Mean:
Sensitivity
to outliers: The arithmetic mean
can be greatly influenced by extreme values or outliers, distorting its
representation of the central tendency.
Limited
use for skewed distributions: Skewed distributions can significantly affect the
arithmetic mean, as it tends to be pulled towards the skewed tail, resulting in
a less accurate representation of the central value.
Inappropriate
for categorical data: The
arithmetic mean is not suitable for categorical or ordinal data, as it requires
numerical calculations and does not consider the categorical nature of the
data.
In summary, the mode has
merits such as suitability for categorical data, simplicity of interpretation,
and robustness to outliers. However, it lacks precision and may not be
applicable to numerical data or multimodal distributions. The arithmetic mean,
on the other hand, utilizes all data points, provides precision, and is
suitable for interval/ratio data but is sensitive to outliers and skewed
distributions. The choice between mode and arithmetic mean depends on the
nature of the data and the specific context of the analysis.