📃 Pandas Data Transformation.
Last Updated : 31th August 2025
Data Transformation is process of Changing the values of data into another format/scale to make it suitable for analysis. Some of the most important functions are-
Let Data is like this
import pandas as pd
df = pd.DataFrame({
"Name": ["A", "B", "C"],
"Age": [25, 30, 22],
"Marks": [85, 90, 95]
})
Sorting and Filtering
📊 Sorting.
# Sort by marks in ascending order
df.sort_values(by="Marks")
# Sort by multiple columns
df.sort_values(by=["Marks", "Age"], ascending=[False, True])
# Sort By index
df.sort_index()
# Ranking
# Rank based on marks
df["Rank"] = df["Marks"].rank(ascending=False)
# Rank based on marks and age
df["Rank"] = df[["Marks", "Age"]].rank(ascending=[False, True], method="min")
📊 Filtering.
import pandas as pd
# Students with Marks greater than 80
print(df[df["Marks"] > 80])
# Students with Marks greater than 80 and Age less than 25
print(df[(df["Marks"] > 80) & (df["Age"] < 25)])
# Students with Marks greater than 80 or Age less than 25
print(df[(df["Marks"] > 80) | (df["Age"] < 25)])
# Students who do NOT have Marks > 80
print(df[~(df["Marks"] > 80)])
Note use
query()for Filtering Large Data.
# Filter by column name
df_filtered = df.query("Marks > 80 & Age < 22")
Binning Data
- Binning means converting continuous values into categories (ranges). e.g. Instead of marks like 45, 67, 89, we categorize them into Low, Medium, High.
cut() : Used for Fixed binning data.
import pandas as pd
marks = [25, 45, 65, 75, 85, 95]
# Create bins
bins = [0, 40, 60, 80, 100]
labels = ["Fail", "Average", "Good", "Excellent"]
result = pd.cut(marks, bins=bins, labels=labels)
print(result)
qcut() : Used for Quantile binning data.It automatically divides data into equal-sized groups (by count).
# Split into 3 quantile bins
result = pd.qcut(marks, q=3, labels=["Low", "Medium", "High"])
print(result)
Map, Apply and Iterate
map() : Used for element-wise transformation on a single column (Series).
# Apply a function to each element of a column
df["Age"].map(lambda x: x+ 2)
apply() : Works on Series and DataFrame.
- On Series: applies function element-wise.
- On DataFrame: applies function row-wise or column-wise.
# Apply on Series
print(df["Age"].apply(lambda x: x + 2))
# Apply on DataFrame (column-wise)
print(df.apply(sum, axis=0)) # sum of each column
# Apply on DataFrame (row-wise)
print(df.apply(sum, axis=1)) # sum of each row
.itterows() Iterate over rows
# Print Age with marks
for idx, row in df.iterrows():
print(f"{row['Age']} years old student scored {row['Marks']} marks")
.itertuples() Iterate over rows as named tuples
# Print Age with marks. Same as above but faster
for row in df.itertuples(index=False): # index=False → don’t include index
print(f"{row.Age} years old student scored {row.Marks} marks")
.items() Iterate over columns
#Find mean of each column
for col, data in df.items():
if data.dtype != "object": # skip text column
print(f"{col} → mean = {data.mean()}")