Data Frames
Object Type Data Frames
In Python, data frames are provided by the pandas
library. A data frame is a two-dimensional data structure, similar to a list of equal-length lists. So basically data frames are simple tables we all now. Each column represents a variable, and each row represents an observation or case. Unlike matrices, data frames can hold columns of different data types (e.g., numeric, string, date, etc.). This makes data frames suitable for storing and working with structured data and allows you to represent real-world datasets with mixed data types in a single structure. Data frames are built with the pd.DataFrame()
function.
import pandas as pd
# Creating lists
a = ["Peter", "Sabine", "Rachel", "Ray", "Ashley"]
b = [24, 42, 12, 56, 57]
c = [1.54, 1.85, 1.30, 1.97, 1.64]
d = [True, False, False, True, False]
# Creating a data frame from lists with assigned column names
patients = pd.DataFrame({
'Name': a,
'Age': b,
'Height': c,
'Ill': d
})
print(patients)
# Output:
# Name Age Height Ill
# 0 Peter 24 1.54 True
# 1 Sabine 42 1.85 False
# 2 Rachel 12 1.30 False
# 3 Ray 56 1.97 True
# 4 Ashley 57 1.64 False
In Python, data frames provided by the pandas library have column names (variable names) and row names (often called row labels) that help identify and reference specific variables and observations. You can access columns using the dot operator or square brackets []
, and you can access rows by their index or labels.
# Accessing columns
print(patients.Name) # Using dot operator
print(patients['Name']) # Using square brackets
# Output:
# 0 Peter
# 1 Sabine
# 2 Rachel
# 3 Ray
# 4 Ashley
# Name: Name, dtype: object
# Accessing rows by index and label
# because the row is at index 0 but is also called 0 we ask for the same
print(patients.iloc[0]) # Accessing the first row
print(patients.loc[0]) # Accessing the row with label 0
# Output:
# Name Peter
# Age 24
# Height 1.54
# Ill True
# Name: 0, dtype: object
New columns can be directly assigned:
# Adding a new column using direct assignment
patients['Last_Name'] = ['Müller','Schmidt','Smith','Brown','Rodriguez']
print(patients)
# Output:
# Name Age Height Ill Last_Name
# 0 Peter 24 1.54 True Müller
# 1 Sabine 42 1.85 False Schmidt
# 2 Rachel 12 1.30 False Smith
# 3 Ray 56 1.97 True Brown
# 4 Ashley 57 1.64 False Rodriguez
Pandas DataFrame Methods
Method/Function | Description |
---|---|
pd.DataFrame(data) |
Creates a DataFrame from a dictionary, list, or array. |
df.head(n) |
Returns the first n rows of the DataFrame. |
df.tail(n) |
Returns the last n rows of the DataFrame. |
df.describe() |
Generates descriptive statistics of the DataFrame. |
df.info() |
Provides a concise summary of the DataFrame. |
df.shape |
Returns a tuple representing the dimensionality of the DataFrame (rows, columns). |
df.columns |
Returns the column labels of the DataFrame. |
df.iloc[row_index, column_index] |
Accesses a group of rows and columns by labels or a boolean array. |
df.loc[row_label, column_label] |
Accesses a group of rows and columns by label(s) or a boolean array. |
df.drop(labels, axis) |
Removes specified row or column labels. |
df.fillna(value) |
Fills NA/NaN values with the specified value. |
df.groupby(by) |
Groups the DataFrame using a mapper or by a Series of columns. |
df.sort_values(by) |
Sorts the DataFrame by the specified column(s). |
df.to_csv('filename.csv') |
Exports the DataFrame to a CSV file. |