Reading Data in Pandas

Last Updated: 28th August 2025


What you'll learn

  • How to load data from CSV, Excel, JSON, SQL
  • Power parameters: index_col, usecols, dtype, nrows, skiprows, na_values

Hinglish Tip 🗣: Data Pandas me laana sabse pehla step hota hai. Pehle sahi tarike se read karo, fir clean & analyze.


📥 CSV — pd.read_csv()

import pandas as pd

# Basic read
df = pd.read_csv("data.csv")
print(df)
import pandas as pd
# With useful parameters
df = pd.read_csv(
    "data.csv",
    index_col="ID",                   # make 'ID' the index
    usecols=["ID", "Name", "Age"],    # only these columns
    nrows=1000,                       # only first 1000 rows
    dtype={"ID": "int64", "Age": "int16"},
    skiprows=2,                       # skip first 2 lines (e.g., notes)
    na_values=["NA", "N/A", "-"]      # treat these as NaN
)
print(df)

Note Use usecols and nrows in read_csv() if you don’t need all columns/rows and Use chunksize for large files,for example:

import pandas as pd
df = pd.read_csv("data.csv", chunksize=1000)
for chunk in df:
    print(chunk)

📊 Excel — pd.read_excel()

import pandas as pd

# Basic read
df = pd.read_excel("data.xlsx")
print(df)
import pandas as pd
df = pd.read_excel(
    "data.xlsx",
    sheet_name="Sheet1",
    index_col=0,
    usecols="A:D",         # Excel-style range OR list of names
    dtype={"Age": "int16"},
    na_values=["NA", ""]
)
print(df)

🧾 JSON — pd.read_json()

import pandas as pd as pd

# For records-oriented JSON: [{...}, {...}]
df = pd.read_json("data.json")

# If your JSON is line-delimited (one JSON per line)
df = pd.read_json("data_lines.json", lines=True)
print(df)

🗂 SQL — pd.read_sql()

import sqlite3
import pandas as pd

conn = sqlite3.connect("mydb.sqlite")

# Option 1: read a full table
df_table = pd.read_sql("SELECT * FROM students", conn)

# Option 2: custom query with WHERE
df_query = pd.read_sql(
    "SELECT id, name, marks FROM students WHERE marks >= ?",
    conn,
    params=(80,)
)

conn.close()
print(df_query)

⚙️ Important Parameters

  • index_col: Column to use as index.
  • usecols: Columns to keep.
  • dtype: Data type for specific columns.
  • nrows: Number of rows to read.
  • skiprows: Number of rows to skip.
  • na_values: Values to consider as NaN.
  • header: Row number to use as header.
  • sheet_name: Name of sheet to read.

Hinglish Tip 🗣: index_col aur usecols se speed aur memory dono bachte hain!