The conversion of raw data into a form that will make it easy to understand & interpret, ie., rearranging, ordering, and manipulating data to provide insightful information about the provided data.

Descriptive Analysis is the type of analysis of data that helps describe, show or summarize data points in a constructive way such that patterns might emerge that fulfill every condition of the data.

It gives you a conclusion of the distribution of your data, helps you detect typos and outliers, and enables you to identify similarities among variables, thus making you ready for conducting further statistical analyses.

Techniques

Data aggregation and data mining are two techniques used in descriptive analysis to churn out historical data. In Data aggregation, data is first collected and then sorted in order to make the datasets more manageable.

Descriptive techniques often include constructing tables of quantiles and means, methods of dispersion such as variance or standard deviation, and cross-tabulations or “crosstabs” that can be used to carry out many disparate hypotheses. These hypotheses often highlight differences among subgroups.

Measures like segregation, discrimination, and inequality are studied using specialised descriptive techniques. Discrimination is measured with the help of audit studies or decomposition methods. More segregation on the basis of type or inequality of outcomes need not be wholly good or bad in itself, but it is often considered a marker of unjust social processes; accurate measurement of the different steps across space and time is a prerequisite to understanding these processes.

A table of means by subgroup is used to show important differences across subgroups, which mostly results in inference and conclusions being made. When we notice a gap in earnings, for example, we naturally tend to extrapolate reasons for those patterns complying.

But this also enters the province of measuring impacts which requires the use of different techniques. Often, random variation causes difference in means, and statistical inference is required to determine whether observed differences could happen merely due to chance.

A crosstab or two-way tabulation is supposed to show the proportions of components with unique values for each of two variables available, or cell proportions. For example, we might tabulate the proportion of the population that has a high school degree and also receives food or cash assistance, meaning a crosstab of education versus receipt of assistance is supposed to be made.

Then we might also want to examine row proportions, or the fractions in each education group who receive food or cash assistance, perhaps seeing assistance levels dip extraordinarily at higher education levels.

Column proportions can also be examined, for the fraction of population with different levels of education, but this is the opposite from any causal effects. We might come across a surprisingly high number or proportion of recipients with a college education, but this might be a result of larger numbers of people being college graduates than people who have less than a high school degree.

Types of Descritive Analysis

Descriptive analysis can be categorized into four types which are measures of frequency, central tendency, dispersion or variation, and position. These methods are optimal for a single variable at a time.

Measures of Frequency

In descriptive analysis, it’s essential to know how frequently a certain event or response is likely to occur. This is the prime purpose of measures of frequency to make like a count or percent.

For example, consider a survey where 500 participants are asked about their favourite IPL team. A list of 500 responses would be difficult to consume and accommodate, but the data can be made much more accessible by measuring how many times a certain IPL team was selected.

Measures of Central Tendency

In descriptive analysis, it’s also important to find out the Central (or average) Tendency or response. Central tendency is measured with the use of three averages — mean, median, and mode. As an example, consider a survey in which the weight of 1,000 people is measured. In this case, the mean average would be an excellent descriptive metric to measure mid-values.

Measures of Dispersion

Sometimes, it is important to know how data is divided across a range. To elaborate this, consider the average weight in a sample of two people. If both individuals are 60 kilos, the average weight will be 60 kg. However, if one individual is 50 kg and the other is 70 kg, the average weight is still 60 kg. Measures of dispersion like range or standard deviation can be employed to measure this kind of distribution.

Measures of Position

Descriptive analysis also involves identifying the position of a single value or its response in relation to others. Measures like percentiles and quartiles become very useful in this area of expertise.

Apart from it, if you’ve collected data on multiple variables, you can use the Bivariate or Multivariate descriptive statistics to study whether there are relationships between them.

In bivariate analysis, you simultaneously study the frequency and variability of two different variables to see if they seem to have a pattern and vary together. You can also test and compare the central tendency of the two variables before carrying out further types of statistical analysis.

Multivariate analysis is the same as bivariate analysis but it is carried out for more than two variables. Following 2 methods are for bivariate analysis.

Contingency table

In a contingency table, each cell represents the combination of the two variables. Naturally, an independent variable (e.g., gender) is listed along the vertical axis and a dependent one is tallied along the horizontal axis (e.g., activities). You need to read “across” the table to witness how the two variables i.e. independent and dependent variables relate to each other.

Scatter plots

A scatter plot is a chart that enables you to see the relationship between two or three different variables. It’s a visual rendition of the strength of a relationship.

In a scatter plot, you are supposed to plot one variable along the x-axis and another one along the y-axis. Each data point is denoted by a point in the chart.