4 Ways to lie in using statistics
A Life / / December 19, 2019
One of the most effective ways of lies - the wrong interpretation of statistics. Knowing how to juggle with numbers, you can notice if you try to cheat.
Gather information that will make your conclusions even more biased
The first step in the collection of statistical data - to determine what you want to analyze. Statisticians call information at this stage parent population. Then you need to define a subclass of data, which in the analysis should represent the population as a whole. The larger and more accurate sampling, the more likely will be the results of the study.
Of course, there are different ways to spoil a statistical sample accidentally or intentionally:
- Selection bias. This error occurs when the people taking part in the study themselves identify themselves as a group, not representing the whole population.
- Random sampling. It occurs when analyzed easily accessible information, and do not try to gather representative data. For example, a news channel can hold a political survey among its viewers. Do not ask people who watch other channels (or do not watch TV), you can not say that the results of such research will reflect reality.
- The refusal of the respondents to participate. Such statistical error happens when people do not respond to the questions asked in a statistical study. This leads to the incorrect display of results. For example, if the study asks the question: "Have you changed ever Husband / Wife?", Some just do not want to admit it. As a result, it would seem that adultery is rare.
- Polls with free access. In such surveys can participate anyone. often not even checked the number of times the same person answered the questions. Examples are various surveys on the Internet. Pass them very interesting, but they can not be considered objective.
The beauty of sampling error is that someone somewhere probably holds an unscientific survey that will confirm any of your theory. So just look for the correct survey on the web or create your own.
Select the results that confirm your ideas
Since statistics using numbers, we believe that it proves any idea. Statistics based on complex mathematical computingWhich can lead to quite opposite results if not handled properly.
To demonstrate the flawed data analysis, English mathematician Francis Anscombe created Anscombe's quartet. It consists of four sets of numerical data in the graphs look quite differently.
The figure X1 - a standard scatterplot; X2 - the curve which initially rises, and then falls downwards; X3 - line rises slightly up to one release on the Y axis; X4 - data on the axis X, but one output, located high on both axles.
For each of the following statements are true of graphs:
- The average value of variable x for each set of data is equal to 9.
- The average value of variable y for each data set is equal to 7.5.
- The dispersion (scatter) variable x - 11 variable y — 4,12.
- The correlation between the variables x and y for each set of data is equal to 0.816.
If we see the data only in the form of text, we would think that the situation is completely the same, although the graphics deny this.
Therefore Anscombe suggested first visualize data, and only then draw conclusions. Of course, if you want to introduce someone into error, skip this step.
Chart, which will emphasize the desired results
Most people do not have time to conduct their own statistical analysis. They expect that you show them charts summarizing all of your research. Properly scheduling should reflect the ideas that correspond to reality. But they can also emphasize the data that you want to show.
Lower the names of some parameters slightly change the scale on the axes, do not explain the context. So you will be able to convince everyone of its rightness.
By all means hide sources
If you open specify its sources, people easily test your conclusions. Of course, if you are aiming to circle all around your finger, would not tell me how you came to your conclusions.
Typically, in articles and studies always indicate the references to sources. In this case, the original work can be provided not completely. The main thing is that the source answered the following questions:
- How to collect data? People interviewed on the phone? Or stopped on the street? Or was it a poll in Twitter? data collection method may indicate a certain selection bias.
- When they were going? Studies have quickly become obsolete, as trends change, so the time to collect information frameworks affect the conclusions.
- Who collected them? Study on the safety of smoking, which took tobacco company, is a little confidence.
- Who interviewed? This is particularly important for the public opinion polls. If a politician is conducting a survey among those who like it, the results will not reflect the opinion of the entire population.
Now that you know how to manipulate numbers and using statistics to prove almost anything. This will help you recognize and refute the lies fabricated theory.