📃 Pandas Data Cleaning

Last Updated : 30th August 2025


Data Cleaning is a process of preprocessing data before it is used for analysis. It involves removing duplicates, handling missing values, and ensuring data quality.There a Some inbuilt Functions:

  • isnull() : Check missing values.
  • notnull(): Opposite of isnull().
  • isna() : Check missing (NaN) values.
  • dropna() : Remove rows with missing values.
  • fillna() : Replace missing values with a specified value.
  • ffill() : Forward-fill missing values.
  • bfill() : Backward-fill missing values.
  • duplicated(): Check for duplicate rows.
  • drop_duplicates(): Remove duplicate rows.

Let Data is like this

import pandas as pd

data = {
    "Name": ["Amit", "Neha", "Raj", None,"Amit"],
    "Age": [25, None, 30, 22,25],
    "City": ["Delhi", "Mumbai", None, "Chennai", "Delhi"]
}
df = pd.DataFrame(data)

📊 Missing data

Check missing (NaN) values.

df.isnull()

Filling and removing missing values

# Drop rows with missing values
print(df.dropna())

# Fill missing values with 0: (Note this fill all the missing values in all the columns at once.)
print(df.fillna(0))

# Fill missing values to a specific column
print(df.fillna({"Age": 0}))

# Fill missing with default values
print(df.fillna({"Name": "Unknown", "Age": 0, "City": "Unknown"}))

# forward fill (This will fill the missing values from the previous row)
print(df.ffill())

# backward fill (This will fill the missing values from the next row)
print(df.bfill())

Handling duplicates

# Check for duplicates
print(df.duplicated())

# Remove duplicates
print(df.drop_duplicates())