Merging, Joining and Concatenating
Last Updated: 31th August 2025
- Merging is the process of combining two or more datasets based on a common key or column.
- Joining is the process of joining two datasets based on a common key or column.
- Types of Join: Inner, Left, Right, Outer
inner→ only matchesleft→ keep all from left DataFrameright→ keep all from right DataFrameouter→ keep all, fill missing with NaN
- Concatenating is the process of combining two or more datasets vertically or horizontally.
Let Data is like this
import pandas as pd
employees = pd.DataFrame({
"emp_id": [1, 2, 3, 4],
"name": ["Amit", "Raj", "Sachin", "Suraj"],
"dept_id": [101, 102, 101, 103]
})
departments = pd.DataFrame({
"dept_id": [101, 102, 104],
"dept_name": ["IT", "HR", "Finance"]
})
merge()
# INNER JOIN
result = pd.merge(employees, departments, on="dept_id", how="inner")
print(result)
Merge Parameters
- on Used when both dataframes have the SAME column name for the key.
- left_on and right_on Use when the key column names are different.
- suffixes Adds text to duplicate column names.
- indicator Shows which rows are matched.
- left_index and right_index Use the index instead of a column.
- validate Checks if the merge relationships are valid. like
"one_to_one","one_to_many","many_to_many","many_to_one". - sort Sorts the result DataFrame.
import pandas as pd
employees = pd.DataFrame({
"emp_id": [1,2,3,4],
"name": ["Amit","Bot","Raj","Dev"],
"dept_id": [10,20,30,40]
})
departments = pd.DataFrame({
"department_code": [10,20,30],
"department_name": ["HR","Finance","IT"]
})
extra_info = pd.DataFrame({
"emp_id": [1,2,3,4],
"city": ["Delhi","Mumbai","Bangalore","Hyderabad"]
})
# ----------------------------------------------------
# 1) MERGE USING:
# left_on, right_on, how, suffixes, indicator
# ----------------------------------------------------
merge1 = employees.merge(
departments,
left_on="dept_id", # key in employees
right_on="department_code", # key in departments
how="left", # keep ALL employees (VLOOKUP style)
suffixes=("_emp","_dept"), # renames duplicate columns
indicator=True # shows match status
)
print("---- MERGE 1 ----")
print(merge1)
print()
# ----------------------------------------------------
# 2) MERGE USING INDEX:
# left_index, right_index
# ----------------------------------------------------
# Set index on both dataframes
emp_index = employees.set_index("emp_id")
info_index = extra_info.set_index("emp_id")
merge2 = emp_index.merge(
info_index,
left_index=True,
right_index=True,
how="left"
)
print("---- MERGE 2 ----")
print(merge2)
print()
# ----------------------------------------------------
# 3) FINAL MERGE USING 'on' COLUMN
# This uses the 'emp_id' column directly
# ----------------------------------------------------
final = merge1.merge(
merge2.reset_index(),
on="emp_id", # SAME column name on both sides
how="left"
)
print("---- FINAL MERGE ----")
print(final)
join() index based joins
df1 = pd.DataFrame({"A": [10, 20, 30]}, index=["x", "y", "z"])
df2 = pd.DataFrame({"B": [100, 200, 300]}, index=["x", "y", "a"])
print(df1.join(df2, how="outer"))`
concat()
# Row wise
df1 = pd.DataFrame({"id": [1, 2], "name": ["A", "B"]})
df2 = pd.DataFrame({"id": [3, 4], "name": ["C", "D"]})
print(pd.concat([df1, df2]))
# Column wise
df3 = pd.DataFrame({"score": [85, 90]})
print(pd.concat([df1, df3], axis=1))