Prior to performing data analysis, it is necessary to cleanse data. This step is crucial and can be the key to success or failure of the data analysis process. It involves intelligent approach as different data may require different techniques of preprocessing. Data need to be prepared in such a way that they reflect the real processes and changes.
Typically, data cleansing includes outliers and incomplete records detection. Outliers are the values that are significantly different from typical values. For instance, the value of parameter describing person’s height might be equal to 3 meters. The incomplete record might be result of faults in data acquisition systems. For instance, temperature sensor might broke down and stop collecting measurements. Once incorrect or incomplete data are discovered they should be removed or repaired.