Merging, Joining and Concatenating

Last Updated: 31th August 2025


  • Merging is the process of combining two or more datasets based on a common key or column.
  • Joining is the process of joining two datasets based on a common key or column.
    • Types of Join: Inner, Left, Right, Outer
    • inner→ only matches
    • left → keep all from left DataFrame
    • right→ keep all from right DataFrame
    • outer→ keep all, fill missing with NaN
  • Concatenating is the process of combining two or more datasets vertically or horizontally.

Let Data is like this

import pandas as pd

employees = pd.DataFrame({
    "emp_id": [1, 2, 3, 4],
    "name": ["Amit", "Raj", "Sachin", "Suraj"],
    "dept_id": [101, 102, 101, 103]
})

departments = pd.DataFrame({
    "dept_id": [101, 102, 104],
    "dept_name": ["IT", "HR", "Finance"]
})

merge()

# INNER JOIN
result = pd.merge(employees, departments, on="dept_id", how="inner")
print(result)

Merge Parameters

  • on Used when both dataframes have the SAME column name for the key.
  • left_on and right_on Use when the key column names are different.
  • suffixes Adds text to duplicate column names.
  • indicator Shows which rows are matched.
  • left_index and right_index Use the index instead of a column.
  • validate Checks if the merge relationships are valid. like "one_to_one", "one_to_many", "many_to_many", "many_to_one".
  • sort Sorts the result DataFrame.
import pandas as pd
employees = pd.DataFrame({
    "emp_id": [1,2,3,4],
    "name": ["Amit","Bot","Raj","Dev"],
    "dept_id": [10,20,30,40]
})

departments = pd.DataFrame({
    "department_code": [10,20,30],
    "department_name": ["HR","Finance","IT"]
})

extra_info = pd.DataFrame({
    "emp_id": [1,2,3,4],
    "city": ["Delhi","Mumbai","Bangalore","Hyderabad"]
})


# ----------------------------------------------------
# 1) MERGE USING:
# left_on, right_on, how, suffixes, indicator
# ----------------------------------------------------
merge1 = employees.merge(
    departments,
    left_on="dept_id",           # key in employees
    right_on="department_code",  # key in departments
    how="left",                  # keep ALL employees (VLOOKUP style)
    suffixes=("_emp","_dept"),   # renames duplicate columns
    indicator=True               # shows match status
)

print("---- MERGE 1 ----")
print(merge1)
print()


# ----------------------------------------------------
# 2) MERGE USING INDEX:
# left_index, right_index
# ----------------------------------------------------
# Set index on both dataframes
emp_index = employees.set_index("emp_id")
info_index = extra_info.set_index("emp_id")

merge2 = emp_index.merge(
    info_index,
    left_index=True,
    right_index=True,
    how="left"
)

print("---- MERGE 2 ----")
print(merge2)
print()


# ----------------------------------------------------
# 3) FINAL MERGE USING 'on' COLUMN
# This uses the 'emp_id' column directly
# ----------------------------------------------------
final = merge1.merge(
    merge2.reset_index(),
    on="emp_id",        # SAME column name on both sides
    how="left"
)

print("---- FINAL MERGE ----")
print(final)

join() index based joins

df1 = pd.DataFrame({"A": [10, 20, 30]}, index=["x", "y", "z"])
df2 = pd.DataFrame({"B": [100, 200, 300]}, index=["x", "y", "a"])

print(df1.join(df2, how="outer"))`

concat()

# Row wise
df1 = pd.DataFrame({"id": [1, 2], "name": ["A", "B"]})
df2 = pd.DataFrame({"id": [3, 4], "name": ["C", "D"]})

print(pd.concat([df1, df2]))

# Column wise
df3 = pd.DataFrame({"score": [85, 90]})
print(pd.concat([df1, df3], axis=1))