Moving average Method

While watching the news you might have noticed the reporter saying that the temperature of a particular city or a country has broken a record. The rainfall of some state or country has set a new bar. How can they know about it? What are the measures that they have taken and studied to say so? These are the time-series data. You all are familiar with time-series data and the various components of the time series.

A Trend in a Time Series

A time series is broadly classified into three categories of long-term fluctuations, short-term or periodic fluctuations, and random variations. A long-term variation or a trend shows the general tendency of the data to increase or decrease during a long period of time. The variation may be gradual but it is inevitably present.

Analysis of Time Series

Suppose you have a time series data. What will you do with it? How can you calculate the effect of each component for the resulting variations in it? The main problems in the analysis of time series are

  • To identify the components and the net effect of whose interaction is shown by the movement of a time series, and
  • To isolate, study, analyze and measure each component independently by making others constant.

Measurement of Trend by the Method of Moving Average

This method uses the concept of ironing out the fluctuations of the data by taking the means. It measures the trend by eliminating the changes or the variations by means of a moving average. The simplest of the mean used for the measurement of a trend is the arithmetic means (averages).

Moving Average

The moving average of a period (extent) m is a series of successive averages of m terms at a time. The data set used for calculating the average starts with first, second, third and etc. at a time and m data taken at a time.

In other words, the first average is the mean of the first m terms. The second average is the mean of the m terms starting from the second data up to (m + 1)th term. Similarly, the third average is the mean of the m terms from the third to (m + 2) th term and so on.

If the extent or the period, m is odd i.e., m is of the form (2k + 1), the moving average is placed against the mid-value of the time interval it covers, i.e., t = k + 1. On the other hand, if m is even i.e., m = 2k, it is placed between the two middle values of the time interval it covers, i.e., t = k and t = k + 1.

When the period of the moving average is even, then we need to synchronize the moving average with the original time period. It is done by centering the moving averages i.e., by taking the average of the two successive moving averages.

Drawbacks of Moving Average

  • The main problem is to determine the extent of the moving average which completely eliminates the oscillatory fluctuations.
  • This method assumes that the trend is linear but it is not always the case.
  • It does not provide the trend values for all the terms.
  • This method cannot be used for forecasting future trend which is the main objective of the time series analysis.

Base Shifting, Splicing and Deflating

Base Shifting

For a variety of reasons, it frequently becomes necessary to change the reference base of an index number series from one time to another without returning to the original raw data and recomputing the entire series. This change of reference base period is usually referred to as “shifting the base”. There are two important reasons for shifting the base:

  1. The previous base has become too old and is almost useless for purposes of comparison. By shifting the base, it is possible to state the series in terms of a more recent time period.
  2. It may be desired to compare several index number series which have been compared on different base period; particularly if the several series are to be shown on the same graph, it different base periods; particularly if the several series are to be shown graph, it may be desirable for them to have the same base period. This may necessitate a shift in the base period.

When base period is to be changed, one possibility is to recompute all index numbers using the new base period. A simpler approximate method is to divide all index numbers for the various years corresponding to the old base period by the index number corresponding to the new base period, expressing the results as percentage. These results represent the new index numbers, the index number for the new base period beings 100%.

Mathematically speaking, this method is strictly applicable only if the index numbers satisfy the circular test. However, for many types of index numbers the method, fortunately, yields results which in practice are close enough to those which would be obtained theoretically.

Splicing

Splicing is a technique where we link the two or more index number series which contain the same items and a common overlapping year but with different base year to form a continuous series. It may be forward splicing or backward splicing. We can further understand this with the help of the table given below:

Splicing The index number of old series The index number of new series
Forward (100/overlapping index number of oldseries)× Given index number of old series No change
Backward No change (Index number of old series/100) × Given index number of new series

Deflating

It refers to the correction for price changes in money wages or money income series.

Inflation adjustment or deflation is the process of removing the effect of price inflation from data. It makes sense to adjust only data that is currency denominated in this way. Examples of such data are weekly wages, the interest rate on your deposits, or the price of a 5kg bag of Red Delicious apples in Kashmir. If you are dealing with a currency denominated time series, deflating it will extinguish the fraction of the up-down movement in it that was a consequence of general inflationary pressure.

Real Wage = (Money Wage / Price index) *100

Real Wage index no. = Index of Money Wage / Price index

Unweighted, Weighted Aggregate Method

To measure the growth and progress of an economy, economists and scientists use many statistical tools. One such very important tool are index numbers. They help reveal the trends and tendencies of the economy and also help in the formulation of economic policies and laws.

There are broadly three types of index numbers price index numbers, value index numbers, and quantity index numbers.

Very simply put, index numbers help us observe the change in some quantity that we cannot otherwise easily observe or measure. For example, we cannot directly measure the growth of business activity in an economy. However, we can study the changes in factors that influence this business activity.

So an index number is a tool to measure the change in a variable quantity that has happened over a defined period of time. These index numbers are not directly measurable, they are represented as percentages which express the relative changes in quantity.

Quantity Index Numbers

Now we will specifically understand what are quantity index numbers. Quantity index numbers measure the change in the quantity or volume of goods sold, consumed or produced during a given time period. Hence it is a measure of relative changes over a period of time in the quantities of a particular set of goods.

Just like price index numbers and value index numbers, there are also two types of quantity index numbers, namely

  • Unweighted Quantity Indices
  • Weighted Quantity Indices

Let us take a look at the various methods, formulas, and examples of both these types of quantity index numbers.

Unweighted Index: Simple Aggregate Method

Here we do a simple and direct comparison of the aggregate quantities of the current year, with those of the previous year. We express this index number as a percentage. No weights are assigned, it is the simplest calculation. The formula is as follows,

Q01=(ΣQ1/ΣQ0)×100

where, Q1 is the quantity of the current year, and Q0 is the quantity of the previous year,

Unweighted Index: Simple Average of Quantity Method

In this method, we take the aggregate quantities of the current year as a percentage of the quantity of the base year. Then to obtain the index number, we average this percentage figure. So the formula under this method is as follows,

Q01= (ΣQ1/ΣQ0) × 100÷N

where N is the total number of items

Weighted Index: Simple Aggregative Method

There are a few various methods for calculating this index number. We will take a look at some of the most important ones.

1) Laspeyres Method

In this method, the base price is taken as the weight. We only use the price of the base year (P0), not the current year. The formula is as follows,

Q01= (ΣQ1P0/ΣQ0P0) × 100

2) Paasche’s Method

Here, the current year price (P1) of the commodity is taken as the weight.

Q01= (ΣQ1P0/ΣQ0P0) × 100

3) Dorbish & Bowley’s Method

Q01= (ΣQ1P0/ΣQ0P0) + (ΣQ1P1/ΣQ0P1) ÷ 2

Weighted Index: Weighted Average of Relative Method

In this method, we use the arithmetic mean for averaging the values. The formula is a little more complex as seen below,

Q01= ΣQV/ ΣV

where

Q= Σq1/Σq0

and

V=q0p0

Cost of Living Index Number

Uses of cost of living index number:

(i) It is used in wage negotiations, dearness allowance, bonus etc., to the workers.

(ii) The cost of living index number measures the change in the retail prices of a specified quantity of goods and services.

(iii) It is also useful to the government in framing policies relating to wages.

(iv) It is used as measures of change in the purchasing power of money and real income.

The cost-of-living index, or general index, shows the difference in living costs between cities. The cost of living in the base city is always expressed as 100. The cost of living in the destination is then indexed against this number. So to take a simple example, if London is the base (100) and New York is the destination, and the New York index is 120, then New York is 20% more expensive than London. Similarly, if London is the base and Budapest is the destination, and the Budapest index is 80, than the cost of living in Budapest is 80% of London’s.

What’s the methodology behind the index?

The cost-of-living index expresses the difference in the cost of living between any two cities in the survey. How is this index calculated?

Using exactly the same price data, but different methods of calculation, a number of different people could come up with a number of markedly different indices. The challenge, therefore, when seeking to construct an index is to know which method is best for the problem at hand and to represent equitably (in one figure) the general trend of price differences in separate locations. To illustrate this point, let us take a simple price survey comparing two fictional cities, “Mumbai” and “Delhi.”

  Mumbai  Delhi 
Bread (1kg)  1.00  1.25 
Potatoes (1kg)  3.00  2.00 
Coffee (1kg)  2.50  1.75 
Sugar (1kg)  1.00  1.75 
TOTAL  7.50  6.75 

Assuming we give equal weight to each of the products, which of the two towns deserves the higher cost of living index number? The answer is: it all depends on how the calculation is made.

1) Mumbai is more expensive if we simply add up the prices of the four items in the index and compare the two cities on that basis.

2) Delhi, however, is more expensive when we use Mumbai as a base city and calculate an index based on the average of relative prices in the two cities:

  Mumbai  Delhi 
Bread  100  125 
Potatoes  100  67 
Coffee  100  70 
Sugar  100  175 
Index  100  109 

However, if the same calculation is done with Delhi serving as a base city, Mumbai becomes the more expensive city:

  Delhi  Mumbai 
Bread  100  80 
Potatoes  100  150 
Coffee  100  143 
Sugar  100  57 
Index  100  107.50 

Thus with the standard price-relatives calculation we can end up in the paradoxical situation where each city is more expensive than the other.

3) Using a different method, both Delhi and Mumbai would have the same index number, ie 100, and neither would be considered more expensive than the other. Such a calculation would be made according to a well-established statistical formula that takes prices in both cities, makes an average of them, and uses this average as the basis for the index comparison. This formula, adopted by the Economist Intelligence Unit for its indices, has some distinct advantages over the standard price-relatives calculation described in Step 2 above. With the EIU formula, for example, the paradoxical situation of the two cities being more expensive than each other cannot arise: if city A = 100 and city B = 110, then this relationship is maintained, even if city B is used as a base (when B = 100 then A = 91). In other words, the EIU indices are reversible. This property ensures that the cost of living allowances established with the aid of the indices are consistent in that executives transferred from city A to B can be dealt with on the same footing as those transferred from city B to A. In addition, the indices are nearly circular. This means that the relationship between any three cities is maintained regardless of which of the cities is used as a base with which to compare the other two. This logical inter-relationship is important in assuring equitable cost of living compensation as executives are transferred from location to location.

The index formula. The index is based on the arithmetic mean of price levels in the two selected cities. In order to calculate the index for the two hypothetical cities examined on the previous page, we must first calculate the average price of each item:

  Mumbai  Delhi  Average price 
Bread  1.00  1.25  1.125 
Potatoes  3.00  2.00  2.500 
Coffee  2.50  1.75  2.125 
Sugar  1.00  1.75  1.375 

Next we compare prices in each town to these average prices:

  Average  Mumbai  Delhi 
Bread  100  89  111 
Potatoes  100  120  80 
Coffee  100  118  82 
Sugar  100  73  127 
General Index  100  100  100 

As we can see the relationship between Mumbai and Delhi prices remains intact: bread is still 25% more expensive in Delhi, potatoes are still 50% more expensive in Mumbai. If we want to compare Mumbai as a base city to Delhi, we must divide Delhi’s index by that of Mumbai and multiply by 100. The result is 100. If we reverse the operation and use Delhi as base, the result is also 100. The two cities are equally expensive.

Range and co-efficient of Range

The range is a measure of dispersion that represents the difference between the highest and lowest values in a dataset. It provides a simple way to understand the spread of data. While easy to calculate, the range is sensitive to outliers and does not provide information about the distribution of values between the extremes.

Range of a distribution gives a measure of the width (or the spread) of the data values of the corresponding random variable. For example, if there are two random variables X and Y such that X corresponds to the age of human beings and Y corresponds to the age of turtles, we know from our general knowledge that the variable corresponding to the age of turtles should be larger.

Since the average age of humans is 50-60 years, while that of turtles is about 150-200 years; the values taken by the random variable Y are indeed spread out from 0 to at least 250 and above; while those of X will have a smaller range. Thus, qualitatively you’ve already understood what the Range of a distribution means. The mathematical formula for the same is given as:

Range = L – S

where

L: The Largets/maximum value attained by the random variable under consideration

S: The smallest/minimum value.

Properties

  • The Range of a given distribution has the same units as the data points.
  • If a random variable is transformed into a new random variable by a change of scale and a shift of origin as:

Y = aX + b

where

Y: the new random variable

X: the original random variable

a,b: constants.

Then the ranges of X and Y can be related as:

RY = |a|RX

Clearly, the shift in origin doesn’t affect the shape of the distribution, and therefore its spread (or the width) remains unchanged. Only the scaling factor is important.

  • For a grouped class distribution, the Range is defined as the difference between the two extreme class boundaries.
  • A better measure of the spread of a distribution is the Coefficient of Range, given by:

Coefficient of Range (expressed as a percentage) = L – SL + S × 100

Clearly, we need to take the ratio between the Range and the total (combined) extent of the distribution. Besides, since it is a ratio, it is dimensionless, and can, therefore, one can use it to compare the spreads of two or more different distributions as well.

  • The range is an absolute measure of Dispersion of a distribution while the Coefficient of Range is a relative measure of dispersion.

Due to the consideration of only the end-points of a distribution, the Range never gives us any information about the shape of the distribution curve between the extreme points. Thus, we must move on to better measures of dispersion. One such quantity is Mean Deviation which is we are going to discuss now.

Interquartile range (IQR)

The interquartile range is the middle half of the data. To visualize it, think about the median value that splits the dataset in half. Similarly, you can divide the data into quarters. Statisticians refer to these quarters as quartiles and denote them from low to high as Q1, Q2, Q3, and Q4. The lowest quartile (Q1) contains the quarter of the dataset with the smallest values. The upper quartile (Q4) contains the quarter of the dataset with the highest values. The interquartile range is the middle half of the data that is in between the upper and lower quartiles. In other words, the interquartile range includes the 50% of data points that fall in Q2 and

The IQR is the red area in the graph below.

The interquartile range is a robust measure of variability in a similar manner that the median is a robust measure of central tendency. Neither measure is influenced dramatically by outliers because they don’t depend on every value. Additionally, the interquartile range is excellent for skewed distributions, just like the median. As you’ll learn, when you have a normal distribution, the standard deviation tells you the percentage of observations that fall specific distances from the mean. However, this doesn’t work for skewed distributions, and the IQR is a great alternative.

I’ve divided the dataset below into quartiles. The interquartile range (IQR) extends from the low end of Q2 to the upper limit of Q3. For this dataset, the range is 21 – 39.

Quartiles, Quartile Deviation and Quartile co-efficient

The Quartile Deviation is a simple way to estimate the spread of a distribution about a measure of its central tendency (usually the mean). So, it gives you an idea about the range within which the central 50% of your sample data lies. Consequently, based on the quartile deviation, the Coefficient of Quartile Deviation can be defined, which makes it easy to compare the spread of two or more different distributions. Since both of these topics are based on the concept of quartiles, we’ll first understand how to calculate the quartiles of a dataset before working with the direct formulae.

Quartiles

A median divides a given dataset (which is already sorted) into two equal halves similarly, the quartiles are used to divide a given dataset into four equal halves. Therefore, logically there should be three quartiles for a given distribution, but if you think about it, the second quartile is equal to the median itself! We’ll deal with the other two quartiles in this section.

  • The first quartileor the lower quartile or the 25th percentile, also denoted by Q1corresponds to the value that lies halfway between the median and the lowest value in the distribution (when it is already sorted in the ascending order). Hence, it marks the region which encloses 25% of the initial data.
  • Similarly, the third quartileor the upper quartile or 75th percentile, also denoted by Q3, corresponds to the value that lies halfway between the median and the highest value in the distribution (when it is already sorted in the ascending order). It, therefore, marks the region which encloses the 75% of the initial data or 25% of the end data.

For a better understanding, look at the representation below for a Gaussian Distribution:

The Quartile Deviation

Formally, the Quartile Deviation is equal to the half of the Inter-Quartile Range and thus we can write it as:

Qd=(Q3–Q1)/2

Therefore, we also call it the Semi Inter-Quartile Range.

  • The Quartile Deviation doesn’t take into account the extreme points of the distribution. Thus, the dispersion or the spread of only the central 50% data is considered.
  • If the scale of the data is changed, the Qd also changes in the same ratio.
  • It is the best measure of dispersion for open-ended systems (which have open-ended extreme ranges).
  • Also, it is less affected by sampling fluctuations in the dataset as compared to the range (another measure of dispersion).
  • Since it is solely dependent on the central values in the distribution, if in any experiment, these values are abnormal or inaccurate, the result would be affected drastically.

The Coefficient of Quartile Deviation

Based on the quartiles, a relative measure of dispersion, known as the Coefficient of Quartile Deviation, can be defined for any distribution. It is formally defined as:

Coefficient of Quartile Deviation = {(Q3–Q1)/(Q3+Q1)}×100

Since it involves a ratio of two quantities of the same dimensions, it is unit-less. Thus, it can act as a suitable parameter for comparing two or more different datasets which may or may not involve quantities with the same dimensions.

So, now let’s go through the solved examples below to get a better idea of how to apply these concepts to various distributions.

Mean deviation with mean, Co-efficient of mean deviation

To understand the dispersion of data from a measure of central tendency, we can use mean deviation. It comes as an improvement over the range. It basically measures the deviations from a value. This value is generally mean or median. Hence although mean deviation about mode can be calculated, mean deviation about mean and median are frequently used.

Note that the deviation of an observation from a value a is d= x-aTo find out mean deviation we need to take the mean of these deviations. However, when this value of a is taken as mean, the deviations are both negative and positive since it is the central value.

This further means that when we sum up these deviations to find out their average, the sum essentially vanishes. Thus to resolve this problem we use absolute values or the magnitude of deviation. The basic formula for finding out mean deviation is :

Mean deviation= Sum of absolute values of deviations from ‘a’ ÷ The number of observations

Coefficient of Mean Deviation:

It is calculated to compare the data of two series. The coefficient of mean deviation is calculated by dividing mean deviation by the average. If deviations are taken from mean, we divide it by mean, if the deviations are taken from median, then it is divided by mode and if the “deviations are taken from median, then we divide mean deviation by median.

  1. For Discrete Series:

M.D. = ∑fdy/N; Where; N=∑f

And dy is the deviation of variable from X, M or Z ignoring signs (Taking +ive signs only).

Steps to Calculate:

  1. Take X, M or Z series as desired.
  2. Take deviations ignoring signs.
  3. Multiply dy by respective f; get ∑fdy
  4. Use the following formula

M.D. = ∑fdy/N

(Note : If value of X or M or Z is in decimal fractions better use Direct Method to get result easily)

When Mean or Median or Mode is in Fractions, in that case, Direct formula is applied

  1. For Continuous Series:

For Continuous Series also ;

M.D. = fdy/N

Standard deviation with co-efficient of Variance

As the name suggests, this quantity is a standard measure of the deviation of the entire data in any distribution. Usually represented by or σ. It uses the arithmetic mean of the distribution as the reference point and normalizes the deviation of all the data values from this mean.

Therefore, we define the formula for the standard deviation of the distribution of a variable X with n data points as:

Variance

Another statistical term that is related to the distribution is the variance, which is the standard deviation squared (variance = SD² ). The SD may be either positive or negative in value because it is calculated as a square root, which can be either positive or negative. By squaring the SD, the problem of signs is eliminated. One common application of the variance is its use in the F-test to compare the variance of two methods and determine whether there is a statistically significant difference in the imprecision between the methods.

In many applications, however, the SD is often preferred because it is expressed in the same concentration units as the data. Using the SD, it is possible to predict the range of control values that should be observed if the method remains stable. As discussed in an earlier lesson, laboratorians often use the SD to impose “gates” on the expected normal distribution of control values.

Coefficient of Variation

Another way to describe the variation of a test is calculate the coefficient of variation, or CV. The CV expresses the variation as a percentage of the mean, and is calculated as follows:

CV% = (SD/Xbar)100

In the laboratory, the CV is preferred when the SD increases in proportion to concentration. For example, the data from a replication experiment may show an SD of 4 units at a concentration of 100 units and an SD of 8 units at a concentration of 200 units. The CVs are 4.0% at both levels and the CV is more useful than the SD for describing method performance at concentrations in between. However, not all tests will demonstrate imprecision that is constant in terms of CV. For some tests, the SD may be constant over the analytical range.

The CV also provides a general “feeling” about the performance of a method. CVs of 5% or less generally give us a feeling of good method performance, whereas CVs of 10% and higher sound bad. However, you should look carefully at the mean value before judging a CV. At very low concentrations, the CV may be high and at high concentrations the CV may be low. For example, a bilirubin test with an SD of 0.1 mg/dL at a mean value of 0.5 mg/dL has a CV of 20%, whereas an SD of 1.0 mg/dL at a concentration of 20 mg/dL corresponds to a CV of 5.0%.

Skewness

Skewness, in statistics, is the degree of distortion from the symmetrical bell curve, or normal distribution, in a set of data. Skewness can be negative, positive, zero or undefined. A normal distribution has a skew of zero, while a lognormal distribution, for example, would exhibit some degree of right-skew.

The three probability distributions depicted below depict increasing levels of right (or positive) skewness. Distributions can also be left (negative) skewed. Skewness is used along with kurtosis to better judge the likelihood of events falling in the tails of a probability distribution.

Right skewness

  • Skewness, in statistics, is the degree of distortion from the symmetrical bell curve in a probability distribution.
  • Distributions can exhibit right (positive) skewness or left (negative) skewness to varying degree.
  • Investors note skewness when judging a return distribution because it, like kurtosis, considers the extremes of the data set rather than focusing solely on the average.

Broadly speaking, there are two types of skewness: They are

(1) Positive skewness

(2) Negative skewnes.

Positive skewness

A series is said to have positive skewness when the following characteristics are noticed:

  • Mean > Median > Mode.
  • The right tail of the curve is longer than its left tail, when the data are plotted through a histogram, or a frequency polygon.
  • The formula of Skewness and its coefficient give positive figures.

Negative Skewness

A series is said to have negative skewness when the following characteristics are noticed:

  • Mode> Median > Mode.
  • The left tail of the curve is longer than the right tail, when the data are plotted through a histogram, or a frequency polygon.
  • The formula of skewness and its coefficient give negative figures.

Thus, a statistical distribution may be three types viz.

  • Symmetric
  • Positively skewed
  • Negatively skewed

Skewness Co-efficient

  1. Pearson’s Coefficient of Skewness #1 uses the mode. The formula is:

    pearson skewness

    Where xbar = the mean, Mo = the mode and s = the standard deviation for the sample.

  2. Pearson’s Coefficient of Skewness #2 uses the median. The formula is:

    Pearson's Coefficient of Skewness

    Where xbar = the mean, Mo = the mode and s = the standard deviation for the sample.

    It is generally used when you don’t know the mode.

Kurtosis

Kurtosis is a statistical measure that defines how heavily the tails of a distribution differ from the tails of a normal distribution. In other words, kurtosis identifies whether the tails of a given distribution contain extreme values.

Along with skewness, kurtosis is an important descriptive statistic of data distribution. However, the two concepts must not be confused with each other. Skewness essentially measures the symmetry of the distribution while kurtosis determines the heaviness of the distribution tails.

In finance, kurtosis is used as a measure of financial risk. A large kurtosis is associated with a high level of risk of an investment because it indicates that there are high probabilities of extremely large and extremely small returns. On the other hand, a small kurtosis signals a moderate level of risk because the probabilities of extreme returns are relatively low.

Excess Kurtosis

An excess kurtosis is a metric that compares the kurtosis of a distribution against the kurtosis of a normal distribution. The kurtosis of a normal distribution equals 3. Therefore, the excess kurtosis is found using the formula below:

Excess Kurtosis = Kurtosis – 3

Types of Kurtosis

The types of kurtosis are determined by the excess kurtosis of a particular distribution. The excess kurtosis can take positive or negative values as well, as values close to zero.

1. Mesokurtic

Data that follows a mesokurtic distribution shows an excess kurtosis of zero or close to zero. It means that if the data follows a normal distribution, it follows a mesokurtic distribution.

2. Leptokurtic

Leptokurtic indicates a positive excess kurtosis distribution. The leptokurtic distribution shows heavy tails on either side, indicating the large outliers. In finance, a leptokurtic distribution shows that the investment returns may be prone to extreme values on either side. Therefore, an investment whose returns follow a leptokurtic distribution is considered to be risky.

3. Platykurtic

A platykurtic distribution shows a negative excess kurtosis. The kurtosis reveals a distribution with flat tails. The flat tails indicate the small outliers in a distribution. In the finance context, the platykurtic distribution of the investment returns is desirable for investors because there is a small probability that the investment would experience extreme returns.

error: Content is protected !!