📃 Pandas Data Transformation.

Last Updated : 31th August 2025


Data Transformation is process of Changing the values of data into another format/scale to make it suitable for analysis. Some of the most important functions are-


Let Data is like this

import pandas as pd

df = pd.DataFrame({
  "Age": [25, 30, 22],
  "Marks": [85, 90, 95]
})

Map, Apply and Iterate

map() : Used for element-wise transformation on a single column (Series).

# Apply a function to each element of a column
df["Age"].map(lambda x: x+ 2)

apply() : Works on Series and DataFrame.

  • On Series: applies function element-wise.
  • On DataFrame: applies function row-wise or column-wise.
# Apply on Series
print(df["Age"].apply(lambda x: x + 2))

# Apply on DataFrame (column-wise)
print(df.apply(sum, axis=0))   # sum of each column

# Apply on DataFrame (row-wise)
print(df.apply(sum, axis=1))   # sum of each row

.itterows() Iterate over rows

# Print Age with marks
for idx, row in df.iterrows():
    print(f"{row['Age']} years old student scored {row['Marks']} marks")

.itertuples() Iterate over rows as named tuples

# Print Age with marks. Same as above but faster
for row in df.itertuples(index=False):  # index=False → don’t include index
    print(f"{row.Age} years old student scored {row.Marks} marks")

.items() Iterate over columns

#Find mean of each column
for col, data in df.items():
    if data.dtype != "object":   # skip text column
        print(f"{col} → mean = {data.mean()}")

Binning Data

  • Binning means converting continuous values into categories (ranges). e.g. Instead of marks like 45, 67, 89, we categorize them into Low, Medium, High.

cut() : Used for Fixed binning data.

import pandas as pd


marks = [25, 45, 65, 75, 85, 95]

# Create bins
bins = [0, 40, 60, 80, 100]
labels = ["Fail", "Average", "Good", "Excellent"]

result = pd.cut(marks, bins=bins, labels=labels)

print(result)

qcut() : Used for Quantile binning data.It automatically divides data into equal-sized groups (by count).

# Split into 3 quantile bins
result = pd.qcut(marks, q=3, labels=["Low", "Medium", "High"])

print(result)