Important terminologies in Statistics: Data, Raw Data, Primary Data, Secondary Data, Population, Census, Survey, Sample Survey, Sampling, Parameter, Unit, Variable, Attribute, Frequency8th April 2021
Data are units of information, often numeric, that are collected through observation. In a more technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects, while a datum (singular of data) is a single value of a single variable.
Although the terms “data” and “information” are often used interchangeably, these terms have distinct meanings. In some popular publications, data are sometimes said to be transformed into information when they are viewed in context or in post-analysis. However, in academic treatments of the subject data are simply units of information. Data are used in scientific research, businesses management (e.g., sales data, revenue, profits, stock price), finance, governance (e.g., crime rates, unemployment rates, literacy rates), and in virtually every other form of human organizational activity (e.g., censuses of the number of homeless people by non-profit organizations).
Data are measured, collected and reported, and analyzed, and from data visualizations such as graphs, tables or images are produced. Data as a general concept refers to the fact that some existing information or knowledge is represented or coded in some form suitable for better usage or processing. Raw data (“unprocessed data”) is a collection of numbers or characters before it has been “cleaned” and corrected by researchers. Raw data needs to be corrected to remove outliers or obvious instrument or data entry errors (e.g., a thermometer reading from an outdoor Arctic location recording a tropical temperature). Data processing commonly occurs by stages, and the “processed data” from one stage may be considered the “raw data” of the next stage. Field data is raw data that is collected in an uncontrolled “in situ” environment. Experimental data is data that is generated within the context of a scientific investigation by observation and recording.
Raw Data, Primary Data
Raw data (sometimes called source data or atomic data) is data that has not been processed for use. A distinction is sometimes made between data and information to the effect that information is the end product of data processing. Raw data that has undergone processing is sometimes referred to as cooked data.
Raw data, also known as primary data, are data (e.g., numbers, instrument readings, figures, etc.) collected from a source. In the context of examinations, the raw data might be described as a raw score.
If a scientist sets up a computerized thermometer which records the temperature of a chemical mixture in a test tube every minute, the list of temperature readings for every minute, as printed out on a spreadsheet or viewed on a computer screen are “raw data”. Raw data have not been subjected to processing, “cleaning” by researchers to remove outliers, obvious instrument reading errors or data entry errors, or any analysis (e.g., determining central tendency aspects such as the average or median result). As well, raw data have not been subject to any other manipulation by a software program or a human researcher, analyst or technician. They are also referred to as primary data. Raw data is a relative term, because even once raw data have been “cleaned” and processed by one team of researchers, another team may consider these processed data to be “raw data” for another stage of research. Raw data can be inputted to a computer program or used in manual procedures such as analyzing statistics from a survey. The term “raw data” can refer to the binary data on electronic storage devices, such as hard disk drives (also referred to as “low-level data”)
Secondary data refers to data that is collected by someone other than the primary user. Common sources of secondary data for social science include censuses, information collected by government departments, organizational records and data that was originally collected for other research purposes. Primary data, by contrast, are collected by the investigator conducting the research.
Secondary data analysis can save time that would otherwise be spent collecting data and, particularly in the case of quantitative data, can provide larger and higher-quality databases that would be unfeasible for any individual researcher to collect on their own. In addition, analysts of social and economic change consider secondary data essential, since it is impossible to conduct a new survey that can adequately capture past change and/or developments. However, secondary data analysis can be less useful in marketing research, as data may be outdated or inaccurate
A population is a distinct group of individuals, whether that group comprises a nation or a group of people with a common characteristic. In statistics, a population is the pool of individuals from which a statistical sample is drawn for a study. Thus, any selection of individuals grouped together by a common feature can be said to be a population.
A sample is a statistically significant portion of a population, not an entire population. For this reason, a statistical analysis of a sample must report the approximate standard deviation, or standard error, of its results from the entire population. Only an analysis of an entire population would have no standard error.
A census is the procedure of systematically enumerating, and acquiring and recording information about the members of a given population. This term is used mostly in connection with national population and housing censuses; other common censuses include the census of agriculture, and other censuses such as the traditional culture, business, supplies, and traffic censuses. The United Nations defines the essential features of population and housing censuses as “individual enumeration, universality within a defined territory, simultaneity and defined periodicity”, and recommends that population censuses be taken at least every ten years. United Nations recommendations also cover census topics to be collected, official definitions, classifications and other useful information to co-ordinate international practices.
A survey is a research method used for collecting data from a predefined group of respondents to gain information and insights into various topics of interest. They can have multiple purposes, and researchers can conduct it in many ways depending on the methodology chosen and the study’s goal. In the year 2020, research is of extreme importance, and hence it’s essential for us to understand the benefits of social research for a target population using the right survey tool.
A sample survey is a method for collecting data from or about the members of a population so that inferences about the entire population can be obtained from a subset, or sample, of the population members.
Sampling is a process used in statistical analysis in which a predetermined number of observations are taken from a larger population. The methodology used to sample from a larger population depends on the type of analysis being performed, but it may include simple random sampling or systematic sampling.
A parameter is a useful component of statistical analysis. It refers to the characteristics that are used to define a given population. It is used to describe a specific characteristic of the entire population. When making an inference about the population, the parameter is unknown because it would be impossible to collect information from every member of the population. Rather, we use a statistic of a sample picked from the population to derive a conclusion about the parameter.
In statistics, a unit is one member of a set of entities being studied. It is the main source for the mathematical abstraction of a “random variable”. Common examples of a unit would be a single person, animal, plant, manufactured item, or country that belongs to a larger collection of such entities being studied.
A statistical unit is a unit of observation or measurement for which data are collected or derived. The statistical unit is therefore the basic element for compiling and tabulating statistical data. It is important to specify the statistical unit when publishing results
A variable is any characteristics, number, or quantity that can be measured or counted. A variable may also be called a data item. Age, sex, business income and expenses, country of birth, capital expenditure, class grades, eye colour and vehicle type are examples of variables. It is called a variable because the value may vary between data units in a population, and may change in value over time.
An attribute refers to the quality of a characteristic. The theory of attributes deals with qualitative types of characteristics that are calculated by using quantitative measurements. Therefore, the attribute needs slightly different kinds of statistical treatments, which the variables do not get. Attributes refer to the characteristics of the item under study, like the habit of smoking, or drinking. So ‘smoking’ and ‘drinking’ both refer to the example of an attribute.
A frequency distribution is a representation, either in a graphical or tabular format, that displays the number of observations within a given interval. The interval size depends on the data being analyzed and the goals of the analyst. The intervals must be mutually exclusive and exhaustive. Frequency distributions are typically used within a statistical context. Generally, frequency distribution can be associated with the charting of a normal distribution.
In statistics the frequency (or absolute frequency) of an event i is the number ni of times the observation occurred/recorded in an experiment or study. These frequencies are often graphically represented in histograms.