Variables Research

A variable is, as the name applies, something that varies. Age, sex, export, income and expenses, family size, country of birth, capital expenditure, class grades, blood pressure readings, preoperative anxiety levels, eye color, and vehicle type are all examples of variables because each of these properties varies or differs from one individual to another.

A variable in research simply refers to a person, place, thing, or phenomenon that you are trying to measure in some way. The best way to understand the difference between a dependent and independent variable is that the meaning of each is implied by what the words tell us about the variable you are using.

Types of Variable

Qualitative Variables

An important distinction between variables is between the qualitative variable and the quantitative variable.

Qualitative variables are those that express a qualitative attribute such as hair color, religion, race, gender, social status, method of payment, and so on. The values of a qualitative variable do not imply a meaningful numerical ordering.

The value of the variable ‘religion’ (Muslim, Hindu,  ..,etc.) differs qualitatively; no ordering of religion is implied. Qualitative variables are sometimes referred to as categorical variables.

Categorical variables may again be described as nominal and ordinal.

Ordinal variables are those which can be logically ordered or ranked higher or lower than another but do not necessarily establish a numeric difference between each category, such as examination grades (A+, A, B+, etc., clothing size (Extra-large, large, medium, small).

Nominal variables are those who can neither be ranked nor logically ordered, such as religion, sex, etc.

A qualitative variable is a characteristic that is not capable of being measured but can be categorized to possess or not to possess some characteristics.

Quantitative Variables

Quantitative variables, also called numeric variables, are those variables that are measured in terms of numbers. A simple example of a quantitative variable is a person’s age.

The age can take on different values because a person can be 20 years old, 35 years old, and so on. Likewise, family size is a quantitative variable, because a family might be comprised of one, two, three members, and so on.

That is, each of these properties or characteristics referred to above varies or differs from one individual to another. Note that these variables are expressed in numbers, for which we call them quantitative or sometimes numeric variables.

A quantitative variable is one for which the resulting observations are numeric and thus possesses a natural ordering or ranking.

Discrete and Continuous Variables

Quantitative variables are again of two types: discrete and continuous.

Variables such as some children in a household or number of defective items in a box are discrete variables since the possible scores are discrete on the scale.

Discrete Variable

A discrete variable, restricted to certain values, usually (but not necessarily) consists of whole numbers, such as the family size, number of defective items in a box. They are often the results of enumeration or counting.

Dependent Variable

The variable that is used to describe or measure the problem or outcome under study is called a dependent variable.

In a causal relationship, the cause is the independent variable, and the effect is the dependent variable. If we hypothesize that smoking causes lung cancer, ‘smoking’ is the independent variable and cancer the dependent variable.

Continuous Variable

A continuous variable is one that may take on an infinite number of intermediate values along a specified interval. Examples are:

  • The sugar level in the human body
  • Blood pressure reading
  • Temperature
  • Height or weight of the human body
  • Rate of bank interest
  • Internal rate of return (IRR)

Independent Variable

The variable that is used to describe or measure the factor that is assumed to cause or at least to influence the problem or outcome is called an independent variable.

The definition implies that the experimenter uses the independent variable to describe or explain the influence or effect of it on the dependent variable.

Variability in the dependent variable is presumed to depend on variability in the independent variable.

Dependent and Independent Variables

In many research settings, there are two specific classes of variables that need to be distinguished from one another, independent variable and dependent variable.

Many research studies are aimed at unrevealing and understanding the causes of underlying phenomena or problems with the ultimate goal of establishing a causal relationship between them.

Background Variable

In almost every study, we collect information such as age, sex, educational attainment, socioeconomic status, marital status, religion, place of birth, and the like. These variables are referred to as background variables.

These variables are often related to many independent variables so that they influence the problem indirectly. Hence, they are called background variables.

Extraneous Variable

Most studies concern the identification of a single independent variable and the measurement of its effect on the dependent variable.

But still, several variables might conceivably affect our hypothesized independent-dependent variable relationship, thereby distorting the study. These variables are referred to as extraneous variables.

Moderating Variable

In any statement of relationships of variables, it is normally hypothesized that in some way, the independent variable ’causes’ the dependent variable to occur. In simple relationships, all other variables are extraneous and are ignored. In actual study situations, such a simple one-to-one relationship needs to be revised to take other variables into account to better explain the relationship.

Suppressor Variable

In many cases, we have good reasons to believe that the variables of interest have a relationship within themselves, but our data fail to establish any such relationship. Some hidden factors may be suppressing the true relationship between the two original variables.

Such a factor is referred to as a suppressor variable because it suppresses the actual relationship between the other two variables.

Intervening Variable

Often an apparent relationship between two variables is caused by a third variable.

For example, variables X and Y may be highly correlated, but only because X causes the third variable, Z, which in turn causes Y. In this case, Z is the intervening variable.

Absolute and Relative Measures

The measure of dispersion indicates the scattering of data. It explains the disparity of data from one another, delivering a precise view of the distribution of data. The measure of dispersion displays and gives us an idea about the variation and central value of an individual item.

Characteristics of a Good Measure of Dispersion

  • It should be easy to calculate & simple to understand.
  • It should be based on all the observations of the series.
  • It should be rigidly defined.
  • It should not be affected by extreme values.
  • It should not be unduly affected by sampling fluctuations.
  • It should be capable of further mathematical treatment and statistical analysis.

Relative Measure of Dispersion

  • Relative measures of dispersion are obtained as ratios or percentages of the average.
  • These are also known as ‘Coefficient of dispersion.’
  • These are pure numbers or percentages totally independent of the units of measurements.

The relative measures of depression are used to compare the distribution of two or more data sets. This measure compares values without units. Common relative dispersion methods include:

  • Co-efficient of Range
  • Co-efficient of Variation
  • Co-efficient of Standard Deviation
  • Co-efficient of Quartile Deviation
  • Co-efficient of Mean Deviation

Absolute Measure of Dispersion

An absolute measure of dispersion contains the same unit as the original data set. Absolute dispersion method expresses the variations in terms of the average of deviations of observations like standard or means deviations. It includes range, standard deviation, quartile deviation, etc.

The types of absolute measures of dispersion are:

  • Range: It is simply the difference between the maximum value and the minimum value given in a data set. Example: 1, 3,5, 6, 7 => Range = 7 -1= 6
  • Variance: Deduct the mean from each data in the set then squaring each of them and adding each square and finally dividing them by the total no of values in the data set is the variance. Variance (σ2)=∑(X−μ)2/N
  • Standard Deviation: The square root of the variance is known as the standard deviation i.e. S.D. = √σ.
  • Quartiles and Quartile Deviation: The quartiles are values that divide a list of numbers into quarters. The quartile deviation is half of the distance between the third and the first quartile.
  • Mean and Mean Deviation: The average of numbers is known as the mean and the arithmetic mean of the absolute deviations of the observations from a measure of central tendency is known as the mean deviation (also called mean absolute deviation).

Causation Method

Causal inference is the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect. The main difference between causal inference and inference of association is that the former analyzes the response of the effect variable when the cause is changed. The science of why things occur is called etiology. Causal inference is an example of causal reasoning.

In statistics, causation is a bit tricky. As you’ve no doubt heard, correlation doesn’t necessarily imply causation. An association or correlation between variables simply indicates that the values vary together. It does not necessarily suggest that changes in one variable cause changes in the other variable. Proving causality can be difficult.

Relationships and Correlation

The expression is, “correlation does not imply causation.” Consequently, you might think that it applies to things like Pearson’s correlation coefficient. And, it does apply to that statistic. However, we’re really talking about relationships between variables in a broader context. Pearson’s is for two continuous variables. However, a relationship can involve different types of variables such as categorical variables, counts, binary data, and so on.

For example, in a medical experiment, you might have a categorical variable that defines which treatment group subjects belong to control group, placebo group, and several different treatment groups. If the health outcome is a continuous variable, you can assess the differences between group means. If the means differ by group, then you can say that mean health outcomes depend on the treatment group. There’s a correlation, or relationship, between the type of treatment and health outcome. Or, maybe we have the treatment groups and the outcome is binary, say infected and not infected. In that case, we’d compare group proportions of the infected/not infected between groups to determine whether treatment correlates with infection rates.

Through this post, I’ll refer to correlation and relationships in this broader sense not just literal correlation coefficients. But relationships between variables, such as differences between group means and proportions, regression coefficients, associations between pairs of categorical variables, and so on.

Causation and Hypothesis Tests

Before moving on to determining whether a relationship is causal, let’s take a moment to reflect on why statistically significant hypothesis test results do not signify causation.

Hypothesis tests are inferential procedures. They allow you to use relatively small samples to draw conclusions about entire populations. For the topic of causation, we need to understand what statistical significance means.

When you see a relationship in sample data, whether it is a correlation coefficient, a difference between group means, or a regression coefficient, hypothesis tests help you determine whether your sample provides sufficient evidence to conclude that the relationship exists in the population. You can see it in your sample, but you need to know whether it exists in the population. It’s possible that random sampling error (i.e., luck of the draw) produced the “relationship” in your sample.

Statistical significance indicates that you have sufficient evidence to conclude that the relationship you observe in the sample also exists in the population.

Hill’s Criteria of Causation

Determining whether a causal relationship exists requires far more in-depth subject area knowledge and contextual information than you can include in a hypothesis test. In 1965, Austin Hill, a medical statistician, tackled this question in a paper that’s become the standard. While he introduced it in the context of epidemiological research, you can apply the ideas to other fields.

Hill describes nine criteria to help establish causal connections. The goal is to satisfy as many criteria possible. No single criterion is sufficient. However, it’s often impossible to meet all the criteria. These criteria are an exercise in critical thought. They show you how to think about determining causation and highlight essential qualities to consider.

Correlation mean causation

Even if there is a correlation between two variables, we cannot conclude that one variable causes a change in the other. This relationship could be coincidental, or a third factor may be causing both variables to change.

For example, Ankit collected data on the sales of ice cream cones and air conditioners in his hometown. He found that when ice cream sales were low, air conditioner sales tended to be low and that when ice cream sales were high, air conditioner sales tended to be high.

  • Ankit can conclude that sales of ice cream cones and air conditioner are positively correlated.
  • Ankit can’t conclude that selling more ice cream cones causes more air conditioners to be sold. It is likely that the increases in the sales of both ice cream cones and air conditioners are caused by a third factor, an increase in temperature!

Concurrent Deviation Method

The method of studying correlation is the simplest of all the methods. The only thing that is required under this method is to find out the direction of change of X variable and Y variable.

A very simple and casual method of finding correlation when we are not serious about the magnitude of the two variables is the application of concurrent deviations.

This method involves in attaching a positive sign for a x-value (except the first) if this value is more than the previous value and assigning a negative value if this value is less than the previous value.

This is done for the y-series as well. The deviation in the x-value and the corresponding y-value is known to be concurrent if both the deviations have the same sign.

Denoting the number of concurrent deviations by c and total number of deviations as m (which must be one less than the number of pairs of x and y values), the coefficient of concurrent-deviations is given by 

rc = +√+ (2C-n)/n

Where rc stands for coefficient of correlation by the concurrent deviation method; C stands for

the number of concurrent deviations or the number of positive signs obtained after multiplying

Dx with Dy

n = Number of pairs of observations compared.

Steps

(i) find out the direction of change of X variable, i.e., as compared with the first value, whether the second value is increasing or decreasing or is constant. If it is increasing put (+) sign; if it is decreasing put (-) sign (minus) and if it is constant put zero. Similarly, as compared to second value find out whether the third value is increasing, decreasing or constant. Repeat the same process for other values. Denote this column by Dx.

(ii) In the same manner as discussed above find out the direction of change of Y variable and denote this column by Dy.

(iii) Multiply Dx with Dy, and determine the value of c, i.e., the number of positive signs.

(iv) Apply the above formula, i.e.,

rc = +√+ (2C-n)/n

Note. The significance of + signs, both (inside the under root and outside the under root) is that we cannot take the under root of minus sign. Therefore, if 2C – n   is negative, this negative  

value of multiplied with the minus sign inside would make it positive and we can take the under root. But the ultimate result would be negative. If 2C-n  is positive then, of course, we get a positive value of the coefficient of correlation.

Percentiles

Percentile is in everyday use, but there is no universal definition for it. The most common definition of a percentile is a number where a certain percentage of scores fall below that number. You might know that you scored 67 out of 90 on a test. But that figure has no real meaning unless you know what percentile you fall into. If you know that your score is in the 90th percentile, that means you scored better than 90% of people who took the test.

In statistics, a percentile (or a centile) is a score below which a given percentage of scores in its frequency distribution fall (exclusive definition) or a score at or below which a given percentage fall (inclusive definition). For example, the 50th percentile (the median) is the score below which 50% (exclusive) or at or below which (inclusive) 50% of the scores in the distribution may be found.

The percentile (or percentile score) and the percentile rank are related terms. The percentile rank of a score is the percentage of scores in its distribution that are less than it, an exclusive definition, and one that can be expressed with a single, simple formula. In contrast, there is not one formula or algorithm for a percentile score but many. Hyndman and Fan identified nine and most statistical and spreadsheet software use one of the methods they describe. Algorithms either return the value of a score that exists in the set of scores (nearest-rank methods) or interpolate between existing scores and are either exclusive or inclusive.

  • The 25th percentile is also called the first quartile.
  • The 50th percentile is generally the median.
  • The 75th percentile is also called the third quartile.
  • The difference between the third and first quartiles is the interquartile range.

Simple and Weighted Averages

Simple Averages

Simple average of a set of values is determined by dividing the sum total of all the values by the number of values in the set.

The formula of simple average can be expressed as follows:

Simple average = (Total of x1 + x2+x3…..+xn)/n

Where;

    x = values in the set

    n = number of values in the set

Weighted average

Weighted average is a means of determining the average of a set of values by assigning weightage to each value in relation to their relative importance/significance.

The formula of weighted average can be expressed as follows:

Weighted average = (Total of x1w1+ x2w2+x3w3…..+xnwn)/(Total of w1 +w2+w3….+wn)

Where;

    x = values in the set

    w = weightage of each value in the set

    n = number of values in the set

Graphic presentation: Technique of Construction of Graphs

Graphic presentation represents a highly developed body of techniques for elucidating, interpreting, and analyzing numerical facts by means of points, lines, areas, and other geometric forms and symbols. Graphic techniques are especially valuable in presenting quantitative data in a simple, clear, and effective manner, as well as facilitating comparisons of values, trends, and relationships. They have the additional advantages of succinctness and popular appeal; the comprehensive pictures they provide can bring out hidden facts and relationships and contribute to a more balanced understanding of a problem.

The choice of a particular graphic technique to present a given set of data is a difficult one, and no hard and fast rules can be made to cover all circumstances. There are, however, certain general goals that should always be kept in mind. These include completeness, clarity, and honesty; but there is often conflict between the goals. For instance, completeness demands that all data points be included in a chart, but often this can be done only at some sacrifice of clarity. Such problems can be mitigated by the practice (highly desirable on other grounds as well) of indicating the source of the data from which the chart was constructed so that the reader himself can investigate further. Another problem occurs when it is necessary to break an axis in order to fit all the data in a reasonable space; clarity is then served, but honesty demands that attention be strongly called to the break.

On the basis of form, charts and graphs may be classified as:

(1) Rectilinear coordinate graphs

(2) Semilogarithmic charts

(3) Bar and column charts

(4) Frequency graphs and related charts

(5) Maps

(6) Miscellaneous charts, including pie diagrams, scattergrams, fan charts, ranking charts, etc.

(7) Pictorial charts

(8) Three-dimensional projection charts.

General Rules for Graphical Representation of Data

There are certain rules to effectively present the information in the graphical representation. They are:

  • Suitable Title: Make sure that the appropriate title is given to the graph which indicates the subject of the presentation.
  • Measurement Unit: Mention the measurement unit in the graph.
  • Proper Scale: To represent the data in an accurate manner, choose a proper scale.
  • Index: Index the appropriate colours, shades, lines, design in the graphs for better understanding.
  • Data Sources: Include the source of information wherever it is necessary at the bottom of the graph.
  • Keep it Simple: Construct a graph in an easy way that everyone can understand.
  • Neat: Choose the correct size, fonts, colours etc. in such a way that the graph should be a visual aid for the presentation of information.

Construction of a Graph

The graphic presentation of data and information offers a quick and simple way of understanding the features and drawing comparisons. Further, it is an effective analytical tool and a graph can help us in finding the mode, median, etc.

One can locate a point in a plane using two mutually perpendicular lines – the X-axis (the horizontal line) and the Y-axis (the vertical line). Their point of intersection is the Origin.

One can locate the position of a point in terms of its distance from both these axes. For example, if a point P is 3 units away from the Y-axis and 5 units away from the X-axis, then its location is as follows:

Key Points

  • We measure the distance of the point from the Y-axis along the X-axis. Similarly, we measure the distance of the point from the X-axis along the Y-axis. Therefore, to measure 3 units from the Y-axis, we move 3 units along the X-axis and likewise for the other coordinate.
  • We then draw perpendicular lines from these two points.
  • The point where the perpendiculars intersect is the position of the point P.
  • We denote it as follows (3,5) or (abscissa, ordinate). Together, they are the coordinates of the point P.
  • The four parts of the plane are Quadrants.
  • Also, we can plot different points for a different pair of values.

Graphs of Frequency Distribution

Frequency distribution, in statistics, a graph or data set organized to show the frequency of occurrence of each possible outcome of a repeatable event observed many times. Simple examples are election returns and test scores listed by percentile. A frequency distribution can be graphed as a histogram or pie chart. For large data sets, the stepped graph of a histogram is often approximated by the smooth curve of a distribution function (called a density function when normalized so that the area under the curve.

In statistics, a frequency distribution is a list, table or graph that displays the frequency of various outcomes in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval.

The famed bell curve, or normal distribution, is the graph of one such function. Frequency distributions are particularly useful in summarizing large data sets and assigning probabilities.

Applications

Managing and operating on frequency tabulated data is much simpler than operation on raw data. There are simple algorithms to calculate median, mean, standard deviation etc. from these tables.

Statistical hypothesis testing is founded on the assessment of differences and similarities between frequency distributions. This assessment involves measures of central tendency or averages, such as the mean and median, and measures of variability or statistical dispersion, such as the standard deviation or variance.

A frequency distribution is said to be skewed when its mean and median are significantly different, or more generally when it is asymmetric. The kurtosis of a frequency distribution is a measure of the proportion of extreme values (outliers), which appear at either end of the histogram. If the distribution is more outlier-prone than the normal distribution it is said to be leptokurtic; if less outlier-prone it is said to be platykurtic.

Letter frequency distributions are also used in frequency analysis to crack ciphers, and are used to compare the relative frequencies of letters in different languages and other languages are often used like Greek, Latin, etc.

Types of Frequency Distribution

  • Grouped frequency distribution.
  • Ungrouped frequency distribution.
  • Cumulative frequency distribution.
  • Relative frequency distribution.
  • Relative cumulative frequency distribution.

Grouped Data

At certain times to ensure that we are making correct and relevant observations from the data set, we may need to group the data into class intervals. This ensures that the frequency distribution best represents the data. Let us make a grouped frequency data table of the same example above of the height of students.

Class Interval Frequency
130-140 4
140-150 5
150-160 3

From the above table, you can see that the value of 150 is put in the class interval of 150-160 and not 140-150.

Example

Frequency Distribution Table

13,14,16,13,16,14,21,14,15

Height Frequency
13 2
14 3
15 1
16 2
21 1

Diagrammatic presentation: One Dimensional and Two-Dimensional Diagrams

Types of Diagrams:

1) One-dimensional diagrams e.g. bar diagrams:

2) Two-dimensional diagrams e.g. rectangles, squares and circles:

3) Pictograms and cartograms

1) One Dimensional diagrams (Bar charts)

  • Data is presented by a series of bars.
  • Of two kinds.
  1. Simple bar charts
  • Data is presented by a series of bars.
  • The height or length of each bar indicates the size of figure presented.
  • The width of the bars is not considered and should be uniform.
  1. Component bar chart (stacked bar chart)
  • Bars are subdivided into component parts.
  • It‟s of two kinds.
  1. Component bar chart (actual)
  2. Percentage component bar chart.
  3. Multiple bar charts
  • The component bar figures are shown as separate bar charts adjoin each other.
  • The height of each bar represents the actual value of the component figure.
  1. Percentage bar diagrams
  • Useful in statistical work which requires the portrayal of relative changes in data.
  • Length of segment is kept 100 and segment cut in this parts represent the components (percentages) of an aggregate.
  1. Deviation bars
  • Used for representing net quantities; excess or deficit. i.e net loss, net profit.
  • Bars can have positive or negative values. Positive values are shown above base line and negative values shown below it.
  1. Broken bars
  • Used in values with great variations. E.g. very large and very small values.
  • The larger bars are broken to gain space fro smaller bars.

Two dimensional Diagrams

The length of the width and length are considered.

The area of the bar represents the data.

Also known as surface or area diagrams.

They include:

  1. a) Rectangles
  • Area of rectangle is equal to product of its length and width.
  • Figures can be represented as they are shown or converted into percentages
  1. b) Squares
  • Used if values have greater variations. i.e 200 and 4.
  • A square root of values of various items to be shown in the diagram and selects a scale to draw the squares.
  1. c) Circles
  • Total and components parts are shown.
  • Area of circle is proportional to square of its radius.
  • Difficult to compare and hence not quite popular is statistics.

Pie Diagrams

Pie diagram is used to represent the components of a variable. For example Pie chart can show the household expenditure, which is divided under different heads like food, clothing, electricity, education and recreation. The pie chart is called so, because the entire graph looks like pie and the components resemble slice cut from pie.

Steps to draw a pie chart

The different components of the variables are converted into percentage form to draw a pie diagram. These percentages are converted into corresponding degrees on the circle.

Draw a circle of appropriate size with a compass. The size of the radius depends upon the available space and other factors of presentation.

Measure the points on the circle representing the size of each sector with the help of protractor.

Arrange the sectors according to the size

Different shades and proper labels must be given to different sectors.

Measures of Central Tendency

One of the important objectives of statistical analysis is to get one single value that describes the characteristics of the entire data. Such a value is called central value or an average.

Thus a central value or an average is a single value that represents a group of values. That single value (the average) explains the characteristics of the entire group. As the average lies in-between the largest and the smallest value of the series, it is called central value.

Characteristics of a good average

  • It should be rigidly defined so that there is no confusion regarding its meaning.
  • It should be easy to understand
  • It should be simple to compute
  • Its definition must be in the form of a mathematical formula.
  • It should be based on all the items of a series
  • It should not be influenced by a single item or a group of items
  • It should be capable of further algebraic treatment
  • It should have sampling stability

Significance of Diagrams and Graphs

  • They give a bird’s eye view of the entire data. Therefore, the information presented is easily understood.
  • They are attractive to the eye
  • They have a great memorising effect.
  • They facilitate comparison of data.

Difference between Diagrams and Graphs

Diagrams are prepared in a plain paper whereas graphs should be prepared in graph paper.

A Graph represents mathematical relations between two variables. But diagrams do not represent mathematical relationship. They help for comparisons.

Diagrams are more attractive to the eye. Therefore they are suitable for publicity and propaganda. They are not so useful for research analysis whereas Graphs are very much useful for research analysis.

Pictograms, Cartograms

Pictograms

A pictogram, also called a pictogramme, pictograph, or simply picto, and in computer usage an icon, is a graphic symbol that conveys its meaning through its pictorial resemblance to a physical object. Pictographs are often used in writing and graphic systems in which the characters are to a considerable extent pictorial in appearance. A pictogram may also be used in subjects such as leisure, tourism, and geography.

A pictogram is a chart that uses pictures to represent data. Pictograms are set out in the same way as bar charts, but instead of bars they use columns of pictures to show the numbers involved.

Pictography is a form of writing which uses representational, pictorial drawings, similarly to cuneiform and, to some extent, hieroglyphic writing, which also uses drawings as phonetic letters or determinative rhymes. Some pictograms, such as Hazards pictograms, are elements of formal languages.

Pictograph has a rather different meaning in the field of prehistoric art, including recent art by traditional societies and then means art painted on rock surfaces, as opposed to petroglyphs; the latters are carved or incised. Such images may or may not be considered pictograms in the general sense.

Standardization

Pictographs can often transcend languages in that they can communicate to speakers of a number of tongues and language families equally effectively, even if the languages and cultures are completely different. This is why road signs and similar pictographic material are often applied as global standards expected to be understood by nearly all.

A standard set of pictographs was defined in the international standard ISO 7001: Public Information Symbols. Other common sets of pictographs are the laundry symbols used on clothing tags and the chemical hazard symbols as standardized by the GHS system.

Pictograms have been popularized in use on the web and in software, better known as “icons” displayed on a computer screen in order to help user navigate a computer system or mobile device.

Pictograms are most commonly used in Key Stage 1 as a simple and engaging introduction to bar charts. Sometimes teachers will give children cut-out pictures to count out and stick onto a ready-made sheet. This physical activity makes the concept very clear for young children.

When compiling information for a pictogram, a teacher will usually encourage their class to collect data about other children: for example, children might be asked to find out about favourite crisps, cakes, animals or colours of the children in their class or another class. Often, they will record this information on a class list and then put it onto a tally chart (for the younger children, the teacher will probably collate a tally chart on the board for the class). This information is then converted into a pictogram.

Children continue to learn about pictograms in Year 3. More advanced pictograms might be used further up the school, where one image represents more than one of an object, so children need to think about how they are interpreting the number of images.

Cartograms

A cartogram (also called a value-area map or an anamorphic map, the latter common among German-speakers) is a thematic map of a set of features (countries, provinces, etc.), in which their geographic size is altered to be directly proportional to a selected ratio-level variable, such as travel time, population, or GNP. Geographic space itself is thus warped, sometimes extremely, in order to visualize the distribution of the variable. It is one of the most abstract types of map; in fact, some forms may more properly be called diagrams. They are primarily used to display emphasis and for analysis as nomographs.

Cartograms leverage the fact that size is the most intuitive visual variable for representing a total amount. In this, it is a strategy that is similar to proportional symbol maps, which scale point features, and many flow maps, which scale the weight of linear features. However, these two techniques only scale the map symbol, not space itself; a map that stretches the length of linear features is considered a linear cartogram (although additional flow map techniques may be added). Once constructed, cartograms are often used as a base for other thematic mapping techniques to visualize additional variables, such as choropleth mapping.

General principles

Since the early days of the academic study of cartograms, they have been compared to map projections in many ways, in that both methods transform (and thus distort) space itself. The goal of designing a cartogram or a map projection is therefore to represent one or more aspects of geographic phenomena as accurately as possible, while minimizing the collateral damage of distortion in other aspects. In the case of cartograms, by scaling features to have a size proportional to a variable other than their actual size, the danger is that the features will be distorted to the degree that they are no longer recognizable to map readers, making them less useful.

As with map projections, the tradeoffs inherent in cartograms have led to a wide variety of strategies, including manual methods and dozens of computer algorithms that produce very different results from the same source data. The quality of each type of cartogram is typically judged on how accurately it scales each feature, as well as on how (and how well) it attempts to preserve some form of recognizability in the features, usually in two aspects: shape and topological relationship (i.e., retained adjacency of neighboring features). It is likely impossible to preserve both of these, so some cartogram methods attempt to preserve one at the expense of the other, some attempt a compromise solution of balancing the distortion of both, and other methods do not attempt to preserve either one, sacrificing all recognizability to achieve another goal.

Several options are available for the geometric shapes:

  • Circles (Dorling), typically brought together to be touching and arranged to retain some semblance of the overall shape of the original space.[26] These often look like proportional symbol maps, and some consider them to be a hybrid between the two types of thematic map.
  • Squares (Levasseur/Demers), treated in much the same way as the circles, although they do not generally fit together as simply.
  • Rectangles (Raisz), in which the height and width of each rectangular district is adjusted to fit within an overall shape. The result looks much like a treemap diagram, although the latter is generally sorted by size rather than geography. These are often contiguous, although the contiguity may be illusory because many of the districts that are adjacent in the map may not be the same as those that are adjacent in reality.

error: Content is protected !!