📃 Pandas Data Transformation.

Last Updated : 31th August 2025


Data Transformation is process of Changing the values of data into another format/scale to make it suitable for analysis. Some of the most important functions are-


Let Data is like this

import pandas as pd

df = pd.DataFrame({
  "Name": ["A", "B", "C"],
  "Age": [25, 30, 22],
  "Marks": [85, 90, 95]
})

Sorting and Filtering

📊 Sorting.

# Sort by marks in ascending order
df.sort_values(by="Marks")

# Sort by multiple columns
df.sort_values(by=["Marks", "Age"], ascending=[False, True])

# Sort By index
df.sort_index()

# Ranking
# Rank based on marks
df["Rank"] = df["Marks"].rank(ascending=False)

# Rank based on marks and age
df["Rank"] = df[["Marks", "Age"]].rank(ascending=[False, True], method="min")

📊 Filtering.

import pandas as pd

# Students with Marks greater than 80

print(df[df["Marks"] > 80])

# Students with Marks greater than 80 and Age less than 25

print(df[(df["Marks"] > 80) & (df["Age"] < 25)])

# Students with Marks greater than 80 or Age less than 25

print(df[(df["Marks"] > 80) | (df["Age"] < 25)])

# Students who do NOT have Marks > 80
print(df[~(df["Marks"] > 80)])

Note use query() for Filtering Large Data.

# Filter by column name
df_filtered = df.query("Marks > 80 & Age < 22")

Binning Data

  • Binning means converting continuous values into categories (ranges). e.g. Instead of marks like 45, 67, 89, we categorize them into Low, Medium, High.

cut() : Used for Fixed binning data.

import pandas as pd


marks = [25, 45, 65, 75, 85, 95]

# Create bins
bins = [0, 40, 60, 80, 100]
labels = ["Fail", "Average", "Good", "Excellent"]

result = pd.cut(marks, bins=bins, labels=labels)

print(result)

qcut() : Used for Quantile binning data.It automatically divides data into equal-sized groups (by count).

# Split into 3 quantile bins
result = pd.qcut(marks, q=3, labels=["Low", "Medium", "High"])

print(result)

Map, Apply and Iterate

map() : Used for element-wise transformation on a single column (Series).

# Apply a function to each element of a column
df["Age"].map(lambda x: x+ 2)

apply() : Works on Series and DataFrame.

  • On Series: applies function element-wise.
  • On DataFrame: applies function row-wise or column-wise.
# Apply on Series
print(df["Age"].apply(lambda x: x + 2))

# Apply on DataFrame (column-wise)
print(df.apply(sum, axis=0))   # sum of each column

# Apply on DataFrame (row-wise)
print(df.apply(sum, axis=1))   # sum of each row

.itterows() Iterate over rows

# Print Age with marks
for idx, row in df.iterrows():
    print(f"{row['Age']} years old student scored {row['Marks']} marks")

.itertuples() Iterate over rows as named tuples

# Print Age with marks. Same as above but faster
for row in df.itertuples(index=False):  # index=False → don’t include index
    print(f"{row.Age} years old student scored {row.Marks} marks")

.items() Iterate over columns

#Find mean of each column
for col, data in df.items():
    if data.dtype != "object":   # skip text column
        print(f"{col} → mean = {data.mean()}")