Historical, Exploratory, Descriptive, Casual Research

Historical Research

Historical research data is subject to external criticism (verification of genuineness or validity of the source) and internal criticism (exploring the meaning of the source). Historical research has time and place dimensions. Simple chronology is not considered historical research because it does not interpret the meaning of events.

Historical research is a qualitative technique. Historical research studies the meaning of past events in an attempt to interpret the facts and explain the cause of events, and their effect in the present events. In doing so, researchers rely heavily on primary historical data (direct accounts of events, archival data – official documents, personal records, and records of eyewitnesses) and less frequently on secondary historical data.

Advantages

  • The research is not involved in the situation that is studied
  • The researchers do not interact with the subjects of study
  • Analysis of historical data may help explain current and future events

Shortcomings

  • Historical data is incomplete and vulnerable to time (documents can be destroyed by wars or over time)
  • It can also be biased and corrupt (e.g. diaries, letters, etc. are influenced by the person writing them)
  • Historical research is a complex and broad category because the topics of research (e.g. the study of a society) are affected by numerous factors that need to be considered and analysed.

Exploratory Research

Exploratory research is “the preliminary research to clarify the exact nature of the problem to be solved.” It is used to ensure additional research is taken into consideration during an experiment as well as determining research priorities, collecting data and honing in on certain subjects which may be difficult to take note of without exploratory research. It can include techniques, such as:

  • Secondary research, such as reviewing available literature and/or data
  • Informal qualitative approaches, such as discussions with consumers, employees, management or competitors
  • Formal qualitative research through in-depth interviews, focus groups, projective methods, case studies or pilot studies

Advantages

  • Flexibility and adaptability to change
  • Exploratory research is effective in laying the groundwork that will lead to future studies.
  • Exploratory studies can potentially save time and other resources by determining at the earlier stages the types of research that are worth pursuing

Disadvantages

  • Exploratory studies generate qualitative information and interpretation of such type of information is subject to bias
  • These types of studies usually make use of a modest number of samples that may not adequately represent the target population. Accordingly, findings of exploratory research cannot be generalized to a wider population.
  • Findings of such type of studies are not usually useful in decision making in a practical level.

Exploratory research Steps

  • Identify the problem: A researcher identifies the subject of research and the problem is addressed by carrying out multiple methods to answer the questions.
  • Create the hypothesis: When the researcher has found out that there are no prior studies and the problem is not precisely resolved, the researcher will create a hypothesis based on the questions obtained while identifying the problem.
  • Further research: Once the data has been obtained, the researcher will continue his study through descriptive investigation. Qualitative methods are used to further study the subject in detail and find out if the information is true or not.

Descriptive Research

Descriptive research is used to describe characteristics of a population or phenomenon being studied. It does not answer questions about how/when/why the characteristics occurred. Rather it addresses the “what” question (what are the characteristics of the population or situation being studied?). The characteristics used to describe the situation or population are usually some kind of categorical scheme also known as descriptive categories. For example, the periodic table categorizes the elements. Scientists use knowledge about the nature of electrons, protons and neutrons to devise this categorical scheme. We now take for granted the periodic table, yet it took descriptive research to devise it. Descriptive research generally precedes explanatory research. For example, over time the periodic table’s description of the elements allowed scientists to explain chemical reaction and make sound prediction when elements were combined.

Hence, descriptive research cannot describe what caused a situation. Thus, descriptive research cannot be used as the basis of a causal relationship, where one variable affects another. In other words, descriptive research can be said to have a low requirement for internal validity.

The description is used for frequencies, averages and other statistical calculations. Often the best approach, prior to writing descriptive research, is to conduct a survey investigation. Qualitative research often has the aim of description and researchers may follow-up with examinations of why the observations exist and what the implications of the findings are.

Types of Descriptive Research

Descriptive research is classified into different types according to the kind of approach that is used in conducting descriptive research. The different types of descriptive research are highlighted below:

  • Descriptive-survey

Descriptive-survey research uses surveys to gather data about varying subjects. This data aims to know the extent to which different conditions can be obtained among these subjects.

For example, a researcher wants to determine the qualification of employed professionals in Maryland. He uses a survey as his research instrument, and each item on the survey related to qualifications is subjected to a Yes/No answer.

This way, the researcher can describe the qualifications possessed by the employed demographics of this community.

  • Descriptive-normative survey

This is an extension of the descriptive-survey, with the addition being the normative element. In the descriptive-normative survey, the results of the study should be compared with the norm.

For example, an organization that wishes to test the skills of its employees by a team may have them take a skills test. The skills tests are the evaluation tool in this case, and the result of this test is compared with the norm of each role.

If the score of the team is one standard deviation above the mean, it is very satisfactory, if within the mean, satisfactory, and one standard deviation below the mean is unsatisfactory.

  • Descriptive-status

This is a quantitative description technique that seeks to answer questions about real-life situations. For example, a researcher researching the income of the employees in a company, and the relationship with their performance.

A survey will be carried out to gather enough data about the income of the employees, then their performance will be evaluated and compared to their income. This will help determine whether a higher income means better performance and low income means lower performance or vice versa.

  • Descriptive-analysis

Descriptive-analysis method of research describes a subject by further analyzing it, which in this case involves dividing it into 2 parts. For example, the HR personnel of a company that wishes to analyze the job role of each employee of the company may divide the employees into the people that work at the Headquarters in the US and those that work from Oslo, Norway office.

A questionnaire is devised to analyze the job role of employees with similar salaries and work in similar positions.

  • Descriptive classification

This method is employed in biological sciences for the classification of plants and animals. A researcher who wishes to classify the sea animals into different species will collect samples from various search stations, then classify them accordingly.

  • Descriptive-comparative

In descriptive-comparative research, the researcher considers 2 variables which are not manipulated, and establish a formal procedure to conclude that one is better than the other. For example, an examination body wants to determine the better method of conducting tests between paper-based and computer-based tests.

A random sample of potential participants of the test may be asked to use the 2 different methods, and factors like failure rates, time factors, and others will be evaluated to arrive at the best method.

  • Correlative Survey

Correlative used to determine whether the relationship between 2 variables is positive, negative, or neutral. That is, if 2 variables, say X and Y are directly proportional, inversely proportional or are not related to each other.

Characteristics of descriptive research

The term descriptive research then refers to research questions, design of the study, and data analysis conducted on that topic. We call it an observational research method because none of the research study variables are influenced in any capacity.

Some distinctive characteristics of descriptive research are:

  • Quantitative research: Descriptive research is a quantitative research method that attempts to collect quantifiable information for statistical analysis of the population sample. It is a popular market research tool that allows us to collect and describe the demographic segment’s nature.
  • Uncontrolled variables: In descriptive research, none of the variables are influenced in any way. This uses observational methods to conduct the research. Hence, the nature of the variables or their behavior is not in the hands of the researcher.
  • Cross-sectional studies: Descriptive research is generally a cross-sectional study where different sections belonging to the same group are studied.
  • The basis for further research: Researchers further research the data collected and analyzed from descriptive research using different research techniques. The data can also help point towards the types of research methods used for the subsequent research.

Casual Research

Causal research, also called explanatory research, is the investigation of (research into) cause-and-effect relationships. To determine causality, it is important to observe variation in the variable assumed to cause the change in the other variables, and then measure the changes in the other variables. Other confounding influences must be controlled for so they don’t distort the results, either by holding them constant in the experimental creation of data, or by using statistical methods. This type of research is very complex and the researcher can never be completely certain that there are no other factors influencing the causal relationship, especially when dealing with people’s attitudes and motivations. There are often much deeper psychological considerations that even the respondent may not be aware of.

There are two research methods for exploring the cause-and-effect relationship between variables: experimentation (e.g., in a laboratory) and statistical research.

Objectives:

  • Understanding which variables are the cause, and which variables are the effect. For example, let’s say a city council wanted to reduce car accidents on their streets. They might find through preliminary descriptive and exploratory research that both accidents and road rage have been steadily increasing over the past 5 years. Instead of automatically assuming that road rage is the cause of these accidents, it would be important to measure whether the opposite could be true. Maybe road rage increases in light of more accidents due to lane closures and increased traffic. It could also be the case of the old adage “correlation does not guarantee causation.” Maybe both are increasing due to another reason like construction, lack of proper traffic controls, or an influx of new drivers.
  • Determining the nature of the relationship between the causal variables and the effect predicted. Continuing with our example, let’s say the city council proved that road rage had an increasing effect on the number of car accidents in the area. The causal research could be used for two things. First measuring the significance of the effect, like quantifying the percentage increase in accidents that can be contributed by road rage. Second, observing how the relationship between the variables works (i.e., enraged drivers are prone to accelerating dangerously or taking more risks, resulting in more accidents).

Advantages of causal researches

  • Causal research helps identify the causes behind processes taking place in the system. Having this knowledge helps the researcher to take necessary actions to fix the problems or to optimize the outcomes.
  • Causal research provides the benefits of replication if there is a need for it.
  • Causal research helps identify the impacts of changing the processes and existing methods.
  • In causal research, the subjects are selected systematically. Because of this, causal research is helpful for higher levels of internal validity.

Disadvantages of causal research

  • The causal research is difficult to administer because sometimes it is not possible to control the effects of all extraneous variables.
  • Causal research is one of the most expensive research to conduct. The management requires a great deal of money and time to conduct research. Sometimes it costs more than 1 or 2 million dollars to test real-life two advertising campaigns.
  • One disadvantage of causal research is that it provides information about your plans to your competitors. For example, they might use the outcomes of your research to identify what you are up to and enter the market before you.
  • The findings of causal research are always inaccurate because there will always be a few previous causes or hidden causes that will be affecting the outcome of your research. For example, if you are planning to study the performance of a new advertising campaign in an already established market. Then it is difficult for you to do this as you don’t know the advertising campaign solely influences the performance of your business understudy or it is affected by the previous advertising campaigns.
  • The results of your research can be contaminated as there will always be a few people outside your market that might affect the results of your study.
  • Another disadvantage of using causal research is that it takes a long time to conduct this research. The accuracy of the causal research is directly proportional to the time you spend on the research as you are required to spend more time to study the long-term effects of a marketing program.
  • Coincidence in causal research is the biggest flaw of the research. Sometimes, the coincidence between a cause and an effect can be assumed as a cause and effect relationship.
  • You can’t conclude merely depending on the outcomes of the causal research. You are required to conduct other types of research alongside the causal research to confirm its output.
  • Sometimes, it is easy for a researcher to identify that two variables are connected, but to determine which variable is the cause and which variable is the effect is challenging for a researcher.

Pure, Basic and Fundamental Research

Basic research, also called pure research or fundamental research, is a type of scientific research with the aim of improving scientific theories for better understanding and prediction of natural or other phenomena.

Basic research focuses on the search for truth or the development of theory. Because of this property, basic research is fundamental. Researchers with their fundamental background knowledge “design studies that can test, refine, modify, or develop theories.”

In contrast, applied research uses scientific theories to develop technology or techniques which can be used to intervene and alter natural or other phenomena. Though often driven simply by curiosity, basic research often fuels the technological innovations of applied science. The two aims are often practiced simultaneously in coordinated research and development.

Basic research advances fundamental knowledge about the world. It focuses on creating and refuting or supporting theories that explain observed phenomena. Pure research is the source of most new scientific ideas and ways of thinking about the world. It can be exploratory, descriptive, or explanatory; however, explanatory research is the most common.

Basic research generates new ideas, principles, and theories, which may not be immediately utilized but nonetheless form the basis of progress and development in different fields. Today’s computers, for example, could not exist without research in pure mathematics conducted over a century ago, for which there was no known practical application at the time. Basic research rarely helps practitioners directly with their everyday concerns; nevertheless, it stimulates new ways of thinking that have the potential to revolutionize and dramatically improve how practitioners deal with a problem in the future.

Here are a few examples of questions asked in pure research:

  • How did the universe begin?
  • What are protons, neutrons, and electrons composed of?
  • How do slime molds reproduce?
  • How do the Neo-Malthusians view the Malthusian theory?
  • What is the specific genetic code of the fruit fly?
  • What is the relevance of the dividend theories in the capital market?

Basic Research Method

  • Interview

An interview is a common method of data collection in basic research that involves having a one-on-one interaction with an individual in order to gather relevant information about a phenomenon. Interview can be structured, unstructured or semi-structured depending on the research process and objectives. 

In a structured interview, the researcher asks a set of premeditated questions while in an unstructured interview, the researcher does not make use of a set of premeditated questions. Rather he or she depends on spontaneity and follow-up questioning in order to gather relevant information.

On the other hand, a semi-structured interview is a type of interview that allows the researcher to deviate from premeditated questions in order to gather more information about the research subject. You can conduct structured interviews online by creating and administering a survey online on Online tool.

  • Observation

Observation is a type of data-gathering method that involves paying close attention to a phenomenon for a specific period of time in order to gather relevant information about its behaviors. When carrying out basic research, the researcher may need to study the research subject for a stipulated period as it interacts with its natural environment.

Observation can be structured or unstructured depending on its procedures and approach. In structured observation, the data collection is carried out using a predefined procedure and in line with a specific schedule while unstructured observation is not restricted to a predetermined procedure.

  • Experiment

An experiment is a type of quantitative data-gathering method that seeks to validate or refute a hypothesis and it can also be used to test existing theories. In this method of data collection, the researcher manipulates dependent and independent variables to achieve objective research outcomes.

  • Questionnaire

A questionnaire is a data collection tool that is made up of a series of questions to which the research subjects provide answers. It is a cost-effective method of data gathering because it allows you to collect large samples of data from the members of the group simultaneously.

You can create and administer your pure research questionnaire online using Online tool and you can also make use of paper questionnaires; although these are easily susceptible to damage.

  Fundamental research Applied research
 

 

 

Purpose

Expand knowledge of processes of business and management

Results in universal principles relating to the process and its relationship to outcomes

Findings of significance and value to society in general

Improve understanding of particular business or management problem

Results in solution to problem

New knowledge limited to problem

Findings of practical relevance and value to managers in organizations

 

 

Context

Undertaken by people based in universities

Choice of topic and objectives determined by the researcher

Flexible time scales

Undertaken by people based in a variety of settings including organizations and universities

Objectives negotiated with originator

Tight time scales

Variables Research

A variable is, as the name applies, something that varies. Age, sex, export, income and expenses, family size, country of birth, capital expenditure, class grades, blood pressure readings, preoperative anxiety levels, eye color, and vehicle type are all examples of variables because each of these properties varies or differs from one individual to another.

A variable in research simply refers to a person, place, thing, or phenomenon that you are trying to measure in some way. The best way to understand the difference between a dependent and independent variable is that the meaning of each is implied by what the words tell us about the variable you are using.

Types of Variable

Qualitative Variables

An important distinction between variables is between the qualitative variable and the quantitative variable.

Qualitative variables are those that express a qualitative attribute such as hair color, religion, race, gender, social status, method of payment, and so on. The values of a qualitative variable do not imply a meaningful numerical ordering.

The value of the variable ‘religion’ (Muslim, Hindu,  ..,etc.) differs qualitatively; no ordering of religion is implied. Qualitative variables are sometimes referred to as categorical variables.

Categorical variables may again be described as nominal and ordinal.

Ordinal variables are those which can be logically ordered or ranked higher or lower than another but do not necessarily establish a numeric difference between each category, such as examination grades (A+, A, B+, etc., clothing size (Extra-large, large, medium, small).

Nominal variables are those who can neither be ranked nor logically ordered, such as religion, sex, etc.

A qualitative variable is a characteristic that is not capable of being measured but can be categorized to possess or not to possess some characteristics.

Quantitative Variables

Quantitative variables, also called numeric variables, are those variables that are measured in terms of numbers. A simple example of a quantitative variable is a person’s age.

The age can take on different values because a person can be 20 years old, 35 years old, and so on. Likewise, family size is a quantitative variable, because a family might be comprised of one, two, three members, and so on.

That is, each of these properties or characteristics referred to above varies or differs from one individual to another. Note that these variables are expressed in numbers, for which we call them quantitative or sometimes numeric variables.

A quantitative variable is one for which the resulting observations are numeric and thus possesses a natural ordering or ranking.

Discrete and Continuous Variables

Quantitative variables are again of two types: discrete and continuous.

Variables such as some children in a household or number of defective items in a box are discrete variables since the possible scores are discrete on the scale.

Discrete Variable

A discrete variable, restricted to certain values, usually (but not necessarily) consists of whole numbers, such as the family size, number of defective items in a box. They are often the results of enumeration or counting.

Dependent Variable

The variable that is used to describe or measure the problem or outcome under study is called a dependent variable.

In a causal relationship, the cause is the independent variable, and the effect is the dependent variable. If we hypothesize that smoking causes lung cancer, ‘smoking’ is the independent variable and cancer the dependent variable.

Continuous Variable

A continuous variable is one that may take on an infinite number of intermediate values along a specified interval. Examples are:

  • The sugar level in the human body
  • Blood pressure reading
  • Temperature
  • Height or weight of the human body
  • Rate of bank interest
  • Internal rate of return (IRR)

Independent Variable

The variable that is used to describe or measure the factor that is assumed to cause or at least to influence the problem or outcome is called an independent variable.

The definition implies that the experimenter uses the independent variable to describe or explain the influence or effect of it on the dependent variable.

Variability in the dependent variable is presumed to depend on variability in the independent variable.

Dependent and Independent Variables

In many research settings, there are two specific classes of variables that need to be distinguished from one another, independent variable and dependent variable.

Many research studies are aimed at unrevealing and understanding the causes of underlying phenomena or problems with the ultimate goal of establishing a causal relationship between them.

Background Variable

In almost every study, we collect information such as age, sex, educational attainment, socioeconomic status, marital status, religion, place of birth, and the like. These variables are referred to as background variables.

These variables are often related to many independent variables so that they influence the problem indirectly. Hence, they are called background variables.

Extraneous Variable

Most studies concern the identification of a single independent variable and the measurement of its effect on the dependent variable.

But still, several variables might conceivably affect our hypothesized independent-dependent variable relationship, thereby distorting the study. These variables are referred to as extraneous variables.

Moderating Variable

In any statement of relationships of variables, it is normally hypothesized that in some way, the independent variable ’causes’ the dependent variable to occur. In simple relationships, all other variables are extraneous and are ignored. In actual study situations, such a simple one-to-one relationship needs to be revised to take other variables into account to better explain the relationship.

Suppressor Variable

In many cases, we have good reasons to believe that the variables of interest have a relationship within themselves, but our data fail to establish any such relationship. Some hidden factors may be suppressing the true relationship between the two original variables.

Such a factor is referred to as a suppressor variable because it suppresses the actual relationship between the other two variables.

Intervening Variable

Often an apparent relationship between two variables is caused by a third variable.

For example, variables X and Y may be highly correlated, but only because X causes the third variable, Z, which in turn causes Y. In this case, Z is the intervening variable.

Absolute and Relative Measures

The measure of dispersion indicates the scattering of data. It explains the disparity of data from one another, delivering a precise view of the distribution of data. The measure of dispersion displays and gives us an idea about the variation and central value of an individual item.

Characteristics of a Good Measure of Dispersion

  • It should be easy to calculate & simple to understand.
  • It should be based on all the observations of the series.
  • It should be rigidly defined.
  • It should not be affected by extreme values.
  • It should not be unduly affected by sampling fluctuations.
  • It should be capable of further mathematical treatment and statistical analysis.

Relative Measure of Dispersion

  • Relative measures of dispersion are obtained as ratios or percentages of the average.
  • These are also known as ‘Coefficient of dispersion.’
  • These are pure numbers or percentages totally independent of the units of measurements.

The relative measures of depression are used to compare the distribution of two or more data sets. This measure compares values without units. Common relative dispersion methods include:

  • Co-efficient of Range
  • Co-efficient of Variation
  • Co-efficient of Standard Deviation
  • Co-efficient of Quartile Deviation
  • Co-efficient of Mean Deviation

Absolute Measure of Dispersion

An absolute measure of dispersion contains the same unit as the original data set. Absolute dispersion method expresses the variations in terms of the average of deviations of observations like standard or means deviations. It includes range, standard deviation, quartile deviation, etc.

The types of absolute measures of dispersion are:

  • Range: It is simply the difference between the maximum value and the minimum value given in a data set. Example: 1, 3,5, 6, 7 => Range = 7 -1= 6
  • Variance: Deduct the mean from each data in the set then squaring each of them and adding each square and finally dividing them by the total no of values in the data set is the variance. Variance (σ2)=∑(X−μ)2/N
  • Standard Deviation: The square root of the variance is known as the standard deviation i.e. S.D. = √σ.
  • Quartiles and Quartile Deviation: The quartiles are values that divide a list of numbers into quarters. The quartile deviation is half of the distance between the third and the first quartile.
  • Mean and Mean Deviation: The average of numbers is known as the mean and the arithmetic mean of the absolute deviations of the observations from a measure of central tendency is known as the mean deviation (also called mean absolute deviation).

Causation Method

Causal inference is the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect. The main difference between causal inference and inference of association is that the former analyzes the response of the effect variable when the cause is changed. The science of why things occur is called etiology. Causal inference is an example of causal reasoning.

In statistics, causation is a bit tricky. As you’ve no doubt heard, correlation doesn’t necessarily imply causation. An association or correlation between variables simply indicates that the values vary together. It does not necessarily suggest that changes in one variable cause changes in the other variable. Proving causality can be difficult.

Relationships and Correlation

The expression is, “correlation does not imply causation.” Consequently, you might think that it applies to things like Pearson’s correlation coefficient. And, it does apply to that statistic. However, we’re really talking about relationships between variables in a broader context. Pearson’s is for two continuous variables. However, a relationship can involve different types of variables such as categorical variables, counts, binary data, and so on.

For example, in a medical experiment, you might have a categorical variable that defines which treatment group subjects belong to control group, placebo group, and several different treatment groups. If the health outcome is a continuous variable, you can assess the differences between group means. If the means differ by group, then you can say that mean health outcomes depend on the treatment group. There’s a correlation, or relationship, between the type of treatment and health outcome. Or, maybe we have the treatment groups and the outcome is binary, say infected and not infected. In that case, we’d compare group proportions of the infected/not infected between groups to determine whether treatment correlates with infection rates.

Through this post, I’ll refer to correlation and relationships in this broader sense not just literal correlation coefficients. But relationships between variables, such as differences between group means and proportions, regression coefficients, associations between pairs of categorical variables, and so on.

Causation and Hypothesis Tests

Before moving on to determining whether a relationship is causal, let’s take a moment to reflect on why statistically significant hypothesis test results do not signify causation.

Hypothesis tests are inferential procedures. They allow you to use relatively small samples to draw conclusions about entire populations. For the topic of causation, we need to understand what statistical significance means.

When you see a relationship in sample data, whether it is a correlation coefficient, a difference between group means, or a regression coefficient, hypothesis tests help you determine whether your sample provides sufficient evidence to conclude that the relationship exists in the population. You can see it in your sample, but you need to know whether it exists in the population. It’s possible that random sampling error (i.e., luck of the draw) produced the “relationship” in your sample.

Statistical significance indicates that you have sufficient evidence to conclude that the relationship you observe in the sample also exists in the population.

Hill’s Criteria of Causation

Determining whether a causal relationship exists requires far more in-depth subject area knowledge and contextual information than you can include in a hypothesis test. In 1965, Austin Hill, a medical statistician, tackled this question in a paper that’s become the standard. While he introduced it in the context of epidemiological research, you can apply the ideas to other fields.

Hill describes nine criteria to help establish causal connections. The goal is to satisfy as many criteria possible. No single criterion is sufficient. However, it’s often impossible to meet all the criteria. These criteria are an exercise in critical thought. They show you how to think about determining causation and highlight essential qualities to consider.

Correlation mean causation

Even if there is a correlation between two variables, we cannot conclude that one variable causes a change in the other. This relationship could be coincidental, or a third factor may be causing both variables to change.

For example, Ankit collected data on the sales of ice cream cones and air conditioners in his hometown. He found that when ice cream sales were low, air conditioner sales tended to be low and that when ice cream sales were high, air conditioner sales tended to be high.

  • Ankit can conclude that sales of ice cream cones and air conditioner are positively correlated.
  • Ankit can’t conclude that selling more ice cream cones causes more air conditioners to be sold. It is likely that the increases in the sales of both ice cream cones and air conditioners are caused by a third factor, an increase in temperature!

Concurrent Deviation Method

The method of studying correlation is the simplest of all the methods. The only thing that is required under this method is to find out the direction of change of X variable and Y variable.

A very simple and casual method of finding correlation when we are not serious about the magnitude of the two variables is the application of concurrent deviations.

This method involves in attaching a positive sign for a x-value (except the first) if this value is more than the previous value and assigning a negative value if this value is less than the previous value.

This is done for the y-series as well. The deviation in the x-value and the corresponding y-value is known to be concurrent if both the deviations have the same sign.

Denoting the number of concurrent deviations by c and total number of deviations as m (which must be one less than the number of pairs of x and y values), the coefficient of concurrent-deviations is given by 

rc = +√+ (2C-n)/n

Where rc stands for coefficient of correlation by the concurrent deviation method; C stands for

the number of concurrent deviations or the number of positive signs obtained after multiplying

Dx with Dy

n = Number of pairs of observations compared.

Steps

(i) find out the direction of change of X variable, i.e., as compared with the first value, whether the second value is increasing or decreasing or is constant. If it is increasing put (+) sign; if it is decreasing put (-) sign (minus) and if it is constant put zero. Similarly, as compared to second value find out whether the third value is increasing, decreasing or constant. Repeat the same process for other values. Denote this column by Dx.

(ii) In the same manner as discussed above find out the direction of change of Y variable and denote this column by Dy.

(iii) Multiply Dx with Dy, and determine the value of c, i.e., the number of positive signs.

(iv) Apply the above formula, i.e.,

rc = +√+ (2C-n)/n

Note. The significance of + signs, both (inside the under root and outside the under root) is that we cannot take the under root of minus sign. Therefore, if 2C – n   is negative, this negative  

value of multiplied with the minus sign inside would make it positive and we can take the under root. But the ultimate result would be negative. If 2C-n  is positive then, of course, we get a positive value of the coefficient of correlation.

Percentiles

Percentile is in everyday use, but there is no universal definition for it. The most common definition of a percentile is a number where a certain percentage of scores fall below that number. You might know that you scored 67 out of 90 on a test. But that figure has no real meaning unless you know what percentile you fall into. If you know that your score is in the 90th percentile, that means you scored better than 90% of people who took the test.

In statistics, a percentile (or a centile) is a score below which a given percentage of scores in its frequency distribution fall (exclusive definition) or a score at or below which a given percentage fall (inclusive definition). For example, the 50th percentile (the median) is the score below which 50% (exclusive) or at or below which (inclusive) 50% of the scores in the distribution may be found.

The percentile (or percentile score) and the percentile rank are related terms. The percentile rank of a score is the percentage of scores in its distribution that are less than it, an exclusive definition, and one that can be expressed with a single, simple formula. In contrast, there is not one formula or algorithm for a percentile score but many. Hyndman and Fan identified nine and most statistical and spreadsheet software use one of the methods they describe. Algorithms either return the value of a score that exists in the set of scores (nearest-rank methods) or interpolate between existing scores and are either exclusive or inclusive.

  • The 25th percentile is also called the first quartile.
  • The 50th percentile is generally the median.
  • The 75th percentile is also called the third quartile.
  • The difference between the third and first quartiles is the interquartile range.

Simple and Weighted Averages

Simple Averages

Simple average of a set of values is determined by dividing the sum total of all the values by the number of values in the set.

The formula of simple average can be expressed as follows:

Simple average = (Total of x1 + x2+x3…..+xn)/n

Where;

    x = values in the set

    n = number of values in the set

Weighted average

Weighted average is a means of determining the average of a set of values by assigning weightage to each value in relation to their relative importance/significance.

The formula of weighted average can be expressed as follows:

Weighted average = (Total of x1w1+ x2w2+x3w3…..+xnwn)/(Total of w1 +w2+w3….+wn)

Where;

    x = values in the set

    w = weightage of each value in the set

    n = number of values in the set

Graphic presentation: Technique of Construction of Graphs

Graphic presentation represents a highly developed body of techniques for elucidating, interpreting, and analyzing numerical facts by means of points, lines, areas, and other geometric forms and symbols. Graphic techniques are especially valuable in presenting quantitative data in a simple, clear, and effective manner, as well as facilitating comparisons of values, trends, and relationships. They have the additional advantages of succinctness and popular appeal; the comprehensive pictures they provide can bring out hidden facts and relationships and contribute to a more balanced understanding of a problem.

The choice of a particular graphic technique to present a given set of data is a difficult one, and no hard and fast rules can be made to cover all circumstances. There are, however, certain general goals that should always be kept in mind. These include completeness, clarity, and honesty; but there is often conflict between the goals. For instance, completeness demands that all data points be included in a chart, but often this can be done only at some sacrifice of clarity. Such problems can be mitigated by the practice (highly desirable on other grounds as well) of indicating the source of the data from which the chart was constructed so that the reader himself can investigate further. Another problem occurs when it is necessary to break an axis in order to fit all the data in a reasonable space; clarity is then served, but honesty demands that attention be strongly called to the break.

On the basis of form, charts and graphs may be classified as:

(1) Rectilinear coordinate graphs

(2) Semilogarithmic charts

(3) Bar and column charts

(4) Frequency graphs and related charts

(5) Maps

(6) Miscellaneous charts, including pie diagrams, scattergrams, fan charts, ranking charts, etc.

(7) Pictorial charts

(8) Three-dimensional projection charts.

General Rules for Graphical Representation of Data

There are certain rules to effectively present the information in the graphical representation. They are:

  • Suitable Title: Make sure that the appropriate title is given to the graph which indicates the subject of the presentation.
  • Measurement Unit: Mention the measurement unit in the graph.
  • Proper Scale: To represent the data in an accurate manner, choose a proper scale.
  • Index: Index the appropriate colours, shades, lines, design in the graphs for better understanding.
  • Data Sources: Include the source of information wherever it is necessary at the bottom of the graph.
  • Keep it Simple: Construct a graph in an easy way that everyone can understand.
  • Neat: Choose the correct size, fonts, colours etc. in such a way that the graph should be a visual aid for the presentation of information.

Construction of a Graph

The graphic presentation of data and information offers a quick and simple way of understanding the features and drawing comparisons. Further, it is an effective analytical tool and a graph can help us in finding the mode, median, etc.

One can locate a point in a plane using two mutually perpendicular lines – the X-axis (the horizontal line) and the Y-axis (the vertical line). Their point of intersection is the Origin.

One can locate the position of a point in terms of its distance from both these axes. For example, if a point P is 3 units away from the Y-axis and 5 units away from the X-axis, then its location is as follows:

Key Points

  • We measure the distance of the point from the Y-axis along the X-axis. Similarly, we measure the distance of the point from the X-axis along the Y-axis. Therefore, to measure 3 units from the Y-axis, we move 3 units along the X-axis and likewise for the other coordinate.
  • We then draw perpendicular lines from these two points.
  • The point where the perpendiculars intersect is the position of the point P.
  • We denote it as follows (3,5) or (abscissa, ordinate). Together, they are the coordinates of the point P.
  • The four parts of the plane are Quadrants.
  • Also, we can plot different points for a different pair of values.

Graphs of Frequency Distribution

Frequency distribution, in statistics, a graph or data set organized to show the frequency of occurrence of each possible outcome of a repeatable event observed many times. Simple examples are election returns and test scores listed by percentile. A frequency distribution can be graphed as a histogram or pie chart. For large data sets, the stepped graph of a histogram is often approximated by the smooth curve of a distribution function (called a density function when normalized so that the area under the curve.

In statistics, a frequency distribution is a list, table or graph that displays the frequency of various outcomes in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval.

The famed bell curve, or normal distribution, is the graph of one such function. Frequency distributions are particularly useful in summarizing large data sets and assigning probabilities.

Applications

Managing and operating on frequency tabulated data is much simpler than operation on raw data. There are simple algorithms to calculate median, mean, standard deviation etc. from these tables.

Statistical hypothesis testing is founded on the assessment of differences and similarities between frequency distributions. This assessment involves measures of central tendency or averages, such as the mean and median, and measures of variability or statistical dispersion, such as the standard deviation or variance.

A frequency distribution is said to be skewed when its mean and median are significantly different, or more generally when it is asymmetric. The kurtosis of a frequency distribution is a measure of the proportion of extreme values (outliers), which appear at either end of the histogram. If the distribution is more outlier-prone than the normal distribution it is said to be leptokurtic; if less outlier-prone it is said to be platykurtic.

Letter frequency distributions are also used in frequency analysis to crack ciphers, and are used to compare the relative frequencies of letters in different languages and other languages are often used like Greek, Latin, etc.

Types of Frequency Distribution

  • Grouped frequency distribution.
  • Ungrouped frequency distribution.
  • Cumulative frequency distribution.
  • Relative frequency distribution.
  • Relative cumulative frequency distribution.

Grouped Data

At certain times to ensure that we are making correct and relevant observations from the data set, we may need to group the data into class intervals. This ensures that the frequency distribution best represents the data. Let us make a grouped frequency data table of the same example above of the height of students.

Class Interval Frequency
130-140 4
140-150 5
150-160 3

From the above table, you can see that the value of 150 is put in the class interval of 150-160 and not 140-150.

Example

Frequency Distribution Table

13,14,16,13,16,14,21,14,15

Height Frequency
13 2
14 3
15 1
16 2
21 1

error: Content is protected !!