Introduction to Data Mining

<<< Previous    Up    Next >>>

Lesson 2.3.1.2

Noisy Data

 

    Normally noise is a minority in the data. This is because noise by nature is a random error. Noisy values can be usually detected by variance analysis of the measured variables.

    Noise can be detected by measuring errors at the source of the data. Other way is to find the inconsistent values for the features or the classes by processing the data after collection, but this is more time consuming.

Noise can be removed by using the following techniques:

bullet

Clustering/Merging

bullet

Smoothing (rounding, averaging within a window)

bullet

Outlier detection (deviation-based or distance-based)

 

<<< Previous    Up    Next >>>