DataPreprocessing.docx

The Best WritersData PreprocessingAssignmentKunal Shah3/7/2022Data Preprocessing:Data preprocessing is the technique which is used to convert the row data into the understandable format. We need to check the quality of data before applying the machine learning algorithms. Here, we discuss the steps of data preprocessing.Import Libraries and Read Data:First of all, we need to import the libraries and read the .csv file of COVID Variants and print the first five rows of the dataset which is as follows:Remove irrelevant Data:Now, we need to remove the irrelevant data. This is also known as feature scaling. We select those features which contains some information of data and drop the others features. The selected features are shown below:TRemove Duplicate Records:Now, we need to remove the duplicates record from the dataset. You can see that there is no duplicate record present in the dataset. The length of the dataset is 100416Check Data types:After this, we check the data types of dataset whether it is correct or not. You can see that it gives the correct data types of each variable.Standardize the data:Now, we need to standardize the data. It can scale all the values in the dataset with the mean value are 0 and standard deviation value is 1. The standardize data is shown below:Investigate the Outliers:Now, we need to check the outlier in the dataset. The outlier is the data point which is far from the other values in the dataset. For this purpose, we draw the box plot to find out the outliers.Now, we need to apply z-score technique to find the index of outlier. This is the z-score index of the outlier which is present in our dataset.Missing Data:Now, we need to handle the missing data from the dataset. You can see that there is no missing data present in the dataset.Normalize the data:Then, we need to normalize the data. Normalization is the techniques which convert the numeric columns into the standard scale. In machine learning, some values are different from the other value multiple times. Here, you can see the normalize data.Encoding Categorical Data:At the end, we use Label Encoder to encode the data. It converts categorical data into numeric data. We convert the data into x and y. The x value contains the data frame that is as follows:The y value contains the target value that is as follows: