Introduction to Data Mining

<<< Previous    Up    Next >>>

Lesson 2.2.1

Data Normalization

 

    Different types of data normalization methods are:

bullet

Decimal Scaling:- This type of scaling transforms the data into a range between [-1,1]. The transformation formula is

v'(i) = v(i)/10k

                for the smallest k such that max( |v'(i)| ) ≤ 1.

        e.g. - For the initial range [-991, 99], k is 3, and v = -991 becomes v' = -0.991.

 

bullet

Min-Max Normalization:- This type of normalization transforms the the data into a desired range, usually [0,1]. The transformation formula is

v’(i) = (v(i) - minA)/(maxA - minA)* (new_maxA - new_minA) + new_minA

n

                where, [minA, maxA] is the initial range and [new_minA, new_maxA] is the new range.

        e.g. - If v = 73600 in [12000, 98000] Þ v'= 0.716 in the new range [0,1].

 

bullet

Zero-Mean normalization:- By using this type of normalization, the mean of the transformed set of data points is reduced to zero. For this, the mean and standard deviation of the initial set of data values are required. The transformation formula is

v' = (v - meanA) / std_devA

                where meanA and std_devA are the mean and standard deviation of the initial data values.

        e.g. - If meanIncome = 54000, and std_devIncome = 16000, then v = 76000 Þ v'= 1.225.

       

<<< Previous    Up    Next >>>