Skewness

Skewness, in statistics, is the degree of distortion from the symmetrical bell curve, or normal distribution, in a set of data. Skewness can be negative, positive, zero or undefined. A normal distribution has a skew of zero, while a lognormal distribution, for example, would exhibit some degree of right-skew.

The three probability distributions depicted below depict increasing levels of right (or positive) skewness. Distributions can also be left (negative) skewed. Skewness is used along with kurtosis to better judge the likelihood of events falling in the tails of a probability distribution.

Right skewness

  • Skewness, in statistics, is the degree of distortion from the symmetrical bell curve in a probability distribution.
  • Distributions can exhibit right (positive) skewness or left (negative) skewness to varying degree.
  • Investors note skewness when judging a return distribution because it, like kurtosis, considers the extremes of the data set rather than focusing solely on the average.

Broadly speaking, there are two types of skewness: They are

(1) Positive skewness

(2) Negative skewnes.

Positive skewness

A series is said to have positive skewness when the following characteristics are noticed:

  • Mean > Median > Mode.
  • The right tail of the curve is longer than its left tail, when the data are plotted through a histogram, or a frequency polygon.
  • The formula of Skewness and its coefficient give positive figures.

Negative skewness

A series is said to have negative skewness when the following characteristics are noticed:

  • Mode> Median > Mode.
  • The left tail of the curve is longer than the right tail, when the data are plotted through a histogram, or a frequency polygon.
  • The formula of skewness and its coefficient give negative figures.

Thus, a statistical distribution may be three types viz.

  • Symmetric
  • Positively skewed
  • Negatively skewed

Skewness Co-efficient

  1. Pearson’s Coefficient of Skewness #1 uses the mode. The formula is:
    pearson skewness
    Where xbar = the mean, Mo = the mode and s = the standard deviation for the sample.
  2. Pearson’s Coefficient of Skewness #2 uses the median. The formula is:
    Pearson's Coefficient of Skewness
    Where xbar = the mean, Mo = the mode and s = the standard deviation for the sample.
    It is generally used when you don’t know the mode.

Kurtosis

Kurtosis is a statistical measure that defines how heavily the tails of a distribution differ from the tails of a normal distribution. In other words, kurtosis identifies whether the tails of a given distribution contain extreme values.

Along with skewness, kurtosis is an important descriptive statistic of data distribution. However, the two concepts must not be confused with each other. Skewness essentially measures the symmetry of the distribution while kurtosis determines the heaviness of the distribution tails.

In finance, kurtosis is used as a measure of financial risk. A large kurtosis is associated with a high level of risk of an investment because it indicates that there are high probabilities of extremely large and extremely small returns. On the other hand, a small kurtosis signals a moderate level of risk because the probabilities of extreme returns are relatively low.

Excess Kurtosis

An excess kurtosis is a metric that compares the kurtosis of a distribution against the kurtosis of a normal distribution. The kurtosis of a normal distribution equals 3. Therefore, the excess kurtosis is found using the formula below:

Excess Kurtosis = Kurtosis – 3

Types of Kurtosis

The types of kurtosis are determined by the excess kurtosis of a particular distribution. The excess kurtosis can take positive or negative values as well, as values close to zero.

1. Mesokurtic

Data that follows a mesokurtic distribution shows an excess kurtosis of zero or close to zero. It means that if the data follows a normal distribution, it follows a mesokurtic distribution.

2. Leptokurtic

Leptokurtic indicates a positive excess kurtosis distribution. The leptokurtic distribution shows heavy tails on either side, indicating the large outliers. In finance, a leptokurtic distribution shows that the investment returns may be prone to extreme values on either side. Therefore, an investment whose returns follow a leptokurtic distribution is considered to be risky.

3. Platykurtic

A platykurtic distribution shows a negative excess kurtosis. The kurtosis reveals a distribution with flat tails. The flat tails indicate the small outliers in a distribution. In the finance context, the platykurtic distribution of the investment returns is desirable for investors because there is a small probability that the investment would experience extreme returns.

Karl Pearson and Rank co-relation

Karl Pearson Coefficient of Correlation (also called the Pearson correlation coefficient or Pearson’s r) is a measure of the strength and direction of the linear relationship between two variables. It ranges from -1 to +1, where +1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 indicates no linear relationship. The formula for Pearson’s r is calculated by dividing the covariance of the two variables by the product of their standard deviations. It is widely used in statistics to analyze the degree of correlation between paired data.

The following are the main properties of correlation.

  1. Coefficient of Correlation lies between -1 and +1:

The coefficient of correlation cannot take value less than -1 or more than one +1. Symbolically,

-1<=r<= + 1 or | r | <1.

  1. Coefficients of Correlation are independent of Change of Origin:

This property reveals that if we subtract any constant from all the values of X and Y, it will not affect the coefficient of correlation.

  1. Coefficients of Correlation possess the property of symmetry:

The degree of relationship between two variables is symmetric as shown below:

  1. Coefficient of Correlation is independent of Change of Scale:

This property reveals that if we divide or multiply all the values of X and Y, it will not affect the coefficient of correlation.

  1. Co-efficient of correlation measures only linear correlation between X and Y.
  2. If two variables X and Y are independent, coefficient of correlation between them will be zero.

Karl Pearson’s Coefficient of Correlation is widely used mathematical method wherein the numerical expression is used to calculate the degree and direction of the relationship between linear related variables.

Pearson’s method, popularly known as a Pearsonian Coefficient of Correlation, is the most extensively used quantitative methods in practice. The coefficient of correlation is denoted by “r”.

If the relationship between two variables X and Y is to be ascertained, then the following formula is used:

Properties of Coefficient of Correlation

  • The value of the coefficient of correlation (r) always lies between±1. Such as:

    r=+1, perfect positive correlation

    r=-1, perfect negative correlation

    r=0, no correlation

  • The coefficient of correlation is independent of the origin and scale.By origin, it means subtracting any non-zero constant from the given value of X and Y the vale of “r” remains unchanged. By scale it means, there is no effect on the value of “r” if the value of X and Y is divided or multiplied by any constant.
  • The coefficient of correlation is a geometric mean of two regression coefficient. Symbolically it is represented as:
  • The coefficient of correlation is “ zero” when the variables X and Y are independent. But, however, the converse is not true.

Assumptions of Karl Pearson’s Coefficient of Correlation

  1. The relationship between the variables is “Linear”, which means when the two variables are plotted, a straight line is formed by the points plotted.
  2. There are a large number of independent causes that affect the variables under study so as to form a Normal Distribution. Such as, variables like price, demand, supply, etc. are affected by such factors that the normal distribution is formed.
  3. The variables are independent of each other.                                     

Note: The coefficient of correlation measures not only the magnitude of correlation but also tells the direction. Such as, r = -0.67, which shows correlation is negative because the sign is “-“ and the magnitude is 0.67.

Spearman Rank Correlation

Spearman rank correlation is a non-parametric test that is used to measure the degree of association between two variables.  The Spearman rank correlation test does not carry any assumptions about the distribution of the data and is the appropriate correlation analysis when the variables are measured on a scale that is at least ordinal.

The Spearman correlation between two variables is equal to the Pearson correlation between the rank values of those two variables; while Pearson’s correlation assesses linear relationships, Spearman’s correlation assesses monotonic relationships (whether linear or not). If there are no repeated data values, a perfect Spearman correlation of +1 or −1 occurs when each of the variables is a perfect monotone function of the other.

Intuitively, the Spearman correlation between two variables will be high when observations have a similar (or identical for a correlation of 1) rank (i.e. relative position label of the observations within the variable: 1st, 2nd, 3rd, etc.) between the two variables, and low when observations have a dissimilar (or fully opposed for a correlation of −1) rank between the two variables.

The following formula is used to calculate the Spearman rank correlation:

ρ = Spearman rank correlation

di = the difference between the ranks of corresponding variables

n = number of observations

Assumptions

The assumptions of the Spearman correlation are that data must be at least ordinal and the scores on one variable must be monotonically related to the other variable.

Least Square Method

The least square method is the process of finding the best-fitting curve or line of best fit for a set of data points by reducing the sum of the squares of the offsets (residual part) of the points from the curve. During the process of finding the relation between two variables, the trend of outcomes are estimated quantitatively. This process is termed as regression analysis. The method of curve fitting is an approach to regression analysis. This method of fitting equations which approximates the curves to given raw data is the least square.

It is quite obvious that the fitting of curves for a particular data set are not always unique. Thus, it is required to find a curve having a minimal deviation from all the measured data points. This is known as the best-fitting curve and is found by using the least-squares method.

Least Square Method

The least-squares method is a crucial statistical method that is practised to find a regression line or a best-fit line for the given pattern. This method is described by an equation with specific parameters. The method of least squares is generously used in evaluation and regression. In regression analysis, this method is said to be a standard approach for the approximation of sets of equations having more equations than the number of unknowns.

The method of least squares actually defines the solution for the minimization of the sum of squares of deviations or the errors in the result of each equation. Find the formula for sum of squares of errors, which help to find the variation in observed data.

The least-squares method is often applied in data fitting. The best fit result is assumed to reduce the sum of squared errors or residuals which are stated to be the differences between the observed or experimental value and corresponding fitted value given in the model.

There are two basic categories of least-squares problems:

  • Ordinary or linear least squares
  • Nonlinear least squares

These depend upon linearity or nonlinearity of the residuals. The linear problems are often seen in regression analysis in statistics. On the other hand, the non-linear problems generally used in the iterative method of refinement in which the model is approximated to the linear one with each iteration.

Least Square Method Graph

In linear regression, the line of best fit is a straight line as shown in the following diagram:

The given data points are to be minimized by the method of reducing residuals or offsets of each point from the line. The vertical offsets are generally used in surface, polynomial and hyperplane problems, while perpendicular offsets are utilized in common practice.

Least Square Method Formula

The least-square method states that the curve that best fits a given set of observations, is said to be a curve having a minimum sum of the squared residuals (or deviations or errors) from the given data points. Let us assume that the given points of data are (x1,y1), (x2,y2), (x3,y3), …, (xn,yn) in which all x’s are independent variables, while all y’s are dependent ones. Also, suppose that f(x) be the fitting curve and d represents error or deviation from each given point.

Now, we can write:

d1 = y1 − f(x1)

d2 = y2 − f(x2)

d3 = y3 − f(x3)

…..

dn = yn – f(xn)

The least-squares explain that the curve that best fits is represented by the property that the sum of squares of all the deviations from given values must be minimum. i.e:

Sum = Minimum Quantity

Limitations for Least-Square Method

The least-squares method is a very beneficial method of curve fitting. Despite many benefits, it has a few shortcomings too. One of the main limitations is discussed here.

In the process of regression analysis, which utilizes the least-square method for curve fitting, it is inevitably assumed that the errors in the independent variable are negligible or zero. In such cases, when independent variable errors are non-negligible, the models are subjected to measurement errors. Therefore, here, the least square method may even lead to hypothesis testing, where parameter estimates and confidence intervals are taken into consideration due to the presence of errors occurring in the independent variables.

Secondary Data: Merits, Limitations, Sources

Secondary data is the data that have been already collected by and readily available from other sources. Such data are cheaper and more quickly obtainable than the primary data and also may be available when primary data can not be obtained at all.

Advantages of Secondary data

  1. It is economical. It saves efforts and expenses.
  2. It is time saving.
  3. It helps to make primary data collection more specific since with the help of secondary data, we are able to make out what are the gaps and deficiencies and what additional information needs to be collected.
  4. It helps to improve the understanding of the problem.
  5. It provides a basis for comparison for the data that is collected by the researcher.

Disadvantages of Secondary Data

  1. Secondary data is something that seldom fits in the framework of the marketing research factors. Reasons for its non-fitting are:
  • Unit of secondary data collection: Suppose you want information on disposable income, but the data is available on gross income. The information may not be same as we require.
  • Class Boundaries may be different when units are same.
Before 5 Years After 5 Years
2500-5000 5000-6000
5001-7500 6001-7000
7500-10000 7001-10000
  1. Thus the data collected earlier is of no use to you.
  1. Accuracy of secondary data is not known.
  2. Data may be outdated.

Evaluation of Secondary Data

Because of the above mentioned disadvantages of secondary data, we will lead to evaluation of secondary data. Evaluation means the following four requirements must be satisfied:

  1. Availability: It has to be seen that the kind of data you want is available or not. If it is not available then you have to go for primary data.
  2. Relevance: It should be meeting the requirements of the problem. For this we have two criteria:
    1. Units of measurement should be the same.
    2. Concepts used must be same and currency of data should not be outdated.
  3. Accuracy: In order to find how accurate, the data is, the following points must be considered: –
  • Specification and methodology used
  • Margin of error should be examined
  • The dependability of the source must be seen.

4. Sufficiency: Adequate data should be available.

Robert W Joselyn has classified the above discussion into eight steps. These eight steps are sub classified into three categories. He has given a detailed procedure for evaluating secondary data.

  • Applicability of research objective.
  • Cost of acquisition.
  • Accuracy of data.

Data: Relevance of data in Current scenario

This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data. What has also changed in the last decade is that we now have the means to sift through these 2.5 quintillion bytes of data in a reasonable amount of time. All these changes have major implications for organizations today.

In organizations, analytics enables professionals to convert extensive data and statistical and quantitative analysis into powerful insights that can drive efficient decisions.

Therefore with analytics, organizations can now base their decisions and strategies on data rather than on gut feelings. Moreover, with the rate at which this data can be analyzed, organizations are able to keep tabs on the customer trends in near real time. As a result effectiveness of a strategy can be determined almost immediately. Thus with powerful insights, analytics promises reduced costs and increased profits.
The analytics Industry is one of the fastest growing in modern times with it poised to become a $50 billion market by 2017. With this sudden surge in the analytics industry, there is a tremendous increase in the demand for analytics expertise across all domains, throughout all major organizations across the globe. It has been predicted that by 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.
IBM’s recent study revealed that “83% of Business Leaders listed Business Analytics as the top priority in their business priority list.”
Deloitte has mentioned in its study that: Decision makers who can leverage everyday data & information into actionable insights for the growth of their organization by taking reliable decisions, will find themselves in a much better position to achieve strategic growth in their career.

There is an information overload in today’s world and data analytics helps to cut out the clutter to help businesses make safe and smart choices.

A recent report by Nucleus Research found that companies realize a return of USD10.66 for every dollar they invest in analytics.

In the developed economies of Europe, government administrators could save more than €100 billion ($149 billion) in operational efficiency improvements alone by using big data, not including using big data to reduce fraud and errors and boost the collection of tax revenues. Thus big data courses in India are going to be essential in a few years.

There is a saying “Today Data is the new Oil”. Data in today’s business & technology world is absolutely crucial. The Big Data technologies and initiatives are rising to analyze this data for gaining insights that can help in making strategic decisions. The concept evolved at the beginning of 21st century, and every technology giant is now making use of Big Data technologies. Big Data refers to vast data sets that may be structured or unstructured. There is a massive amount of data which has been produced everyday by businesses & users alike. Big data analytics is the process of examining large data sets to find the underlying insights & patterns. Data analytics field is absolutely vast.

The Big Data Analytics is indeed a revolution in the field of information technology. The use of Data analytics by the companies is increasing day by day. The primary focus of the companies is on the customers. Hence this field is flourishing in the area of B2C applications. There are 3 divisions of Big data analytics: Prescriptive Analytics, Predictive Analytics, Descriptive Analytics. There are four different perspectives to explain why big data analytics is so important. They are

  • Data Science Perspective
  • Business Perspective
  •  Real Time Usability Perspective
  • Job Market Perspective

Big Data Analytics & Data Science

The analytics involves the use of advanced techniques & tools of analytics on a data obtained via different sources & different sizes. Big Data has the properties of high variety, volume & velocity. The data sets are basically retrieved from various online networks, web pages, audio & video devices, social media, logs & many other sources.

It involves the use of techniques like machine learning, data mining, natural language processing & statistics. The data is extracted, prepared & blended to provide analysis for the businesses.

Benefits of Big Data Analytics

Due to enormous growth in the field of Big Data Analytics it is extensively used in multiple industries like

  • Banking
  • Healthcare
  • Energy
  • Technology
  • Consumer
  • Manufacturing

The importance of big data analytics leads to intense competition and increased demand for big data professionals. Data Science and Analytics is an evolving field with huge potential. Data analytics help in analyzing the value chain of business and gain insights. The use of analytics can enhance the industry knowledge of the analysts. Data analytics experts provide the organizations a chance to learn about the opportunities for the business.

Types of Date: Primary & Secondary

Primary data

Primary data are original observations collected by the researcher or his agent for the first time for any investigation and used by them in the statistical analysis.

The primary data is the one type of important data. It is collection of data from first-hand information.

This information published by one organization for some purposes. This type of primary data is mostly pure and original data.

The primary data collection is having three different data collection methods are:

  • Data Collection through Investigation:

In this method, trained investigators are working as employees for collecting the data. The researchers will use the tools like interview and collect the information from the individual persons.

  • Personal Investigation Methods:

The researchers or the data collectors will conduct the survey and hence they collect the data. In this method we have to collect more accurate data and original data. This method is useful for small data collection only not big collection of data projects.

  • Data Collection through Telephones:

The data researcher uses the tools like telephones, mobile phones to collect the information or data. This is accurate and very quick process for data collection. But information collected is not accurate and true.

(2) secondary data

The secondary data is the other type of data, which is collection of data from second hand information. This information is known as, given data is already collected from any one person for some purpose, and it has available for the present issues. And mostly these secondary data are not relevant and pure or original data.

Primary Data Census vs Samples

In Statistics, the basis of all statistical calculations or interpretation lies in the collection of data. There are numerous methods of data collection. In this lesson, we shall focus on two primary methods and understand the difference between them. Both are suitable in different cases and the knowledge of these methods is important to understand when to apply which method. These two methods are the Census method and Sampling method.

Census Method

Census method is the method of statistical enumeration where all members of the population are studied. A population refers to the set of all observations under concern. For example, if you want to carry out a survey to find out student’s feedback about the facilities of your school, all the students of your school would form a part of the ‘population’ for your study.

At a more realistic level, a country wants to maintain information and records about all households. It can collect this information by surveying all households in the country using the census method.

In our country, the Government conducts the Census of India every ten years. The Census appropriates information from households regarding their incomes, the earning members, the total number of children, members of the family, etc. This method must take into account all the units. It cannot leave out anyone in collecting data. Once collected, the Census of India reveals demographic information such as birth rates, death rates, total population, population growth rate of our country, etc. The last census was conducted in the year 2011.

Sampling Method

Like we have studied, the population contains units with some similar characteristics on the basis of which they are grouped together for the study. In the case of the Census of India, for example, the common characteristic was that all units are Indian nationals. But it is not always practical to collect information from all the units of the population.

It is a time-consuming and costly method. Thus, an easy way out would be to collect information from some representative group from the population and then make observations accordingly. This representative group which contains some units from the whole population is called the sample.

The first most important step in selecting a sample is to determine the population. Once the population is identified, a sample must be selected. A good sample is one which is:

  • Small in size.
  • It provides adequate information about the whole population.
  • It takes less time to collect and is less costly.

In the case of our previous example, you could choose students from your class to be the representative sample out of the population (all students in the school). However, there must be some rationale behind choosing the sample. If you think your class comprises a set of students who will give unbiased opinions/feedback or if you think your class contains students from different backgrounds and their responses would be relevant to your student, you must choose them as your sample. Otherwise, it is ideal to choose another sample which might be more relevant.

Again, realistically, the government wants estimates on the average income of the Indian household. It is difficult and time-consuming to study all households. The government can simply choose, say, 50 households from each state of the country and calculate the average of that to arrive at an estimate. This estimate is not necessarily the actual figure that would be arrived at if all units of the population underwent study. But it approximately gives an idea of what the figure might look like.

Difference between Census and Sample Surveys

Parameter

Census

Sample Survey

Definition A statistical method that studies all the units or members of a population. A statistical method that studies only a representative group of the population, and not all its members.
Calculation Total/Complete Partial
Time involved It is a time-consuming process. It is a quicker process.
Cost involved It is a costly method. It is a relatively inexpensive method.
Accuracy The results obtained are accurate as each member is surveyed. So, there is a negligible error. The results are relatively inaccurate due to leaving out of items from the sample. The resulting error is large.
Reliability Highly reliable Low reliability
Error Not present The smaller the sample size, the larger the error.
Relevance This method is suited for heterogeneous data. This method is suited for homogeneous data.

Methods of Primary Data Collection: Observation, Interview, Questionnaire, and Survey

Primary Data is information collected firsthand by a researcher for a specific research purpose. It is original, fresh, and tailored directly to the research question or objective. Methods such as surveys, interviews, experiments, and observations are commonly used to gather primary data. Since it is collected directly from the source, primary data is highly relevant, specific, and accurate. However, it often requires more time, effort, and resources compared to using existing information. It is essential for studies needing updated or detailed insights.

Methods of Primary Data Collection:

  • Observation

Observation involves systematically watching and recording behaviors, events, or phenomena as they occur naturally or in a controlled setting. It allows researchers to gather real-time, unbiased data without influencing the subject’s behavior. Observations can be structured (following a predefined checklist) or unstructured (open-ended). It is especially useful when participants are unwilling or unable to provide accurate verbal responses. Researchers may act as participants (participant observation) or as non-intrusive observers. Observation is widely used in fields like anthropology, psychology, and marketing to understand behaviors, workflows, or consumer interactions. It provides deep insights but may sometimes lack the ability to explain the reasons behind certain actions, requiring combination with other methods like interviews for richer analysis.

  • Interview

An interview is a direct, face-to-face, telephonic, or video-based conversation between the researcher and the participant aimed at gathering detailed information. Interviews can be structured (fixed questions), semi-structured (guided by a framework but flexible), or unstructured (open conversation). This method allows for in-depth exploration of opinions, emotions, experiences, and motivations. Interviews can be personal or group-based, depending on research needs. They are commonly used in qualitative research to gain comprehensive understanding and context behind responses. Although interviews provide rich, detailed data, they can be time-consuming and may introduce biases if not conducted carefully. Proper interviewer skills are essential for encouraging honest and open communication from participants.

  • Questionnaire

Questionnaire is a set of written or digital questions designed to collect information from respondents. It can include closed-ended questions (like multiple-choice) or open-ended questions (where respondents write answers in their own words). Questionnaires are often used for surveys and research studies where standardized information is needed from a large audience. They are cost-effective, easy to distribute, and efficient in data collection. Responses are easy to quantify for statistical analysis. However, the design of the questionnaire is crucial — poorly framed questions can lead to misunderstandings and unreliable data. Questionnaires are widely used in education, social science, market research, and customer satisfaction studies.

  • Survey

Survey is a research method involving the systematic collection of information from a sample of individuals, usually through questionnaires or interviews. Surveys can be conducted in-person, via phone, online, or by mail. They are useful for gathering quantitative as well as qualitative data about behaviors, attitudes, preferences, or demographics. Surveys are popular because they can cover large populations at relatively low cost and produce statistically significant results if designed properly. However, their effectiveness depends on clear question framing, respondent honesty, and sampling methods. Surveys are widely used in fields like business, healthcare, political science, and social research for decision-making and trend analysis.

Presentation of Data: Classification, frequency distribution, Discrete & continuous

  • It is the process of arranging data into homogeneous (similar) groups according to their common characteristics.
  • Raw data cannot be easily understood and it is not fit for further analysis and interpretation. This arrangement of data helps users in comparison and analysis.
  • For example, the Population of town can be grouped according to sex, age, marital status etc.

Classification of data

The method of arranging data into homogeneous classes according to some common features present in the data is called classification.

A planned data analysis system makes fundamental data easy to find and recover. This can be of particular interest for legal discovery, risk management and compliance. Written methods and set of guidelines for data classification should determine what levels and measures the company will use to organise data and define the roles of employees within the business regarding input stewardship. Once a data-classification scheme has been designed, security standards that stipulate proper approaching practices for each division and storage criteria that determine the data’s lifecycle demands should be discussed.

Objectives of Data Classification

The primary objectives of data classification are:

  • To consolidate the volume of data in such a way that similarities and differences can be quickly understood. Figures can consequently be ordered in a few sections holding common traits.
  • To aid comparison.
  • To point out the important characteristics of the data at a flash.
  • To give importance to the prominent data collected while separating the optional elements.
  • To allow a statistical method of the material gathered.
Definition of Classification Given by Prof. Secrist “Classification is the process of arranging data into sequences according to their common characteristics or Separating them into different related parts.”
(a) Meaning of Variable
  • The term variable is derived from the word ‘vary’ which means to differ or change. Hence, variable means the characteristic which varies or differs or changes from person to person, time to time, place to place etc. Or
  • A variable refers to quantity or attribute whose value varies from one investigation to another.
  • For example:

1.     “Price” is a variable as prices of different commodities are different.

2.     “Age” is a variable as age of different students varies.

3.     Some more examples are Height, Weight, Wages, Expenditure, Imports, Production, etc.

(B) Kinds of Variable:
(I) Discrete Variable
  • Variables which are capable of taking an only exact value and not any fractional value are termed as discrete variables.
  • For example, a number of workers or number of students in a class is a discrete variable as they cannot be in fraction. Similarly, a number of children in a family can be 1, 2 or so on, but cannot be 1.5, 2.75.
(II) Continuous Variable
  • Those variables which can take all the possible values (integral as well as fractional) in a given specified range are termed as continuous variables.
  • For example, Temperature, Height, Weight, Marks etc.

Methods of Classification

Following Are the Basis of Classification:
(1) Geographical Classification
  • When data are classified with reference to geographical locations such as countries, states, cities, districts, etc. it is known as Geographical Classification.
  • It is also known as ‘Spatial Classification’.
(2) Chronological Classification
  • When data are grouped according to time, such a classification is known as a Chronological Classification.
  • In such a classification, data are classified either in ascending or in descending order with reference to time such as years, quarters, months, weeks, etc.
  • It is also called ‘Temporal Classification’.
(3) Qualitative Classification
  • Under this classification, data are classified on the basis of some attributes or qualities like honesty, beauty, intelligence, literacy, marital status etc.
  • For example, Population can be divided on the basis of marital status as married or unmarried etc.
(4) Quantitative Classification
  • This type of classification is made on the basis some measurable characteristics like height, weight, age, income, marks of students, etc.
error: Content is protected !!