Grouping And Aggregating
Last Updated: 31th August 2025
- Grouping is the process of spiting Data into different groups based on a certain criteria.
- Aggregating is the process of applying some function on data within each group like sum, mean, count etc.
Let, Data is like this
import pandas as pd
data = {
"Department": ["IT", "IT", "HR", "HR", "Finance", "Finance"],
"Employee": ["A", "B", "C", "D", "E", "F"],
"Salary": [50000, 60000, 45000, 47000, 70000, 75000],
"Bonus": [5000, 6000, 4000, 4200, 8000, 8500]
}
df = pd.DataFrame(data)
print(df)
groupby() : Used to group data based on a specified key.
# Group by Department and calculate average salary
print(df.groupby("Department")["Salary"].mean())
# Apply multiple aggregations
print(df.groupby("Department")["Salary"].agg(["mean", "sum", "max"]))
# Group by multiple columns
df["BonusRange"] = pd.cut(df["Bonus"], bins=[0, 5000, 7000, 10000], labels=["Low", "Medium", "High"])
print(df.groupby(["Department", "BonusRange"],observed=True)["Salary"].mean())
.agg() : Pass our own function inside .agg()
def range_func(x):
return x.max() - x.min()
def double_mean(x):
return x.mean() * 2
# Apply multiple
print(df.groupby("Department")["Salary"].agg([range_func, double_mean]))
.transform() : Returns a series with same size as original, repeating the group-wise value.
print(df.groupby("Department")["Salary"].transform("mean"))