Grouping And Aggregating

Last Updated: 31th August 2025


  • Grouping is the process of spiting Data into different groups based on a certain criteria.
  • Aggregating is the process of applying some function on data within each group like sum, mean, count etc.

Let, Data is like this

import pandas as pd

data = {
    "Department": ["IT", "IT", "HR", "HR", "Finance", "Finance"],
    "Employee": ["A", "B", "C", "D", "E", "F"],
    "Salary": [50000, 60000, 45000, 47000, 70000, 75000],
    "Bonus": [5000, 6000, 4000, 4200, 8000, 8500]
}

df = pd.DataFrame(data)
print(df)

groupby() : Used to group data based on a specified key.

# Group by Department and calculate average salary
print(df.groupby("Department")["Salary"].mean())

# Apply multiple aggregations
print(df.groupby("Department")["Salary"].agg(["mean", "sum", "max"]))

# Group by multiple columns
df["BonusRange"] = pd.cut(df["Bonus"], bins=[0, 5000, 7000, 10000], labels=["Low", "Medium", "High"])

print(df.groupby(["Department", "BonusRange"],observed=True)["Salary"].mean())

.agg() : Pass our own function inside .agg()

def range_func(x):
    return x.max() - x.min()

def double_mean(x):
    return x.mean() * 2

# Apply multiple
print(df.groupby("Department")["Salary"].agg([range_func, double_mean]))

.transform() : Returns a series with same size as original, repeating the group-wise value.

print(df.groupby("Department")["Salary"].transform("mean"))