| Reading Data |
read.csv |
File path (character string) |
Reads a CSV file into a data frame. |
Ensure the file path is correct. Handles text encoding issues and can set stringsAsFactors = FALSE to avoid unexpected factor conversion. |
| |
read.table |
File path (character string) |
Reads a table into a data frame. |
Offers flexibility for delimiter-separated files but requires manual settings like sep for delimiters and header for column names. |
| Simple Math |
sum |
Numeric vector |
Calculates the sum of vector elements. |
Watch out for NA values; use na.rm = TRUE to ignore missing data. |
| |
mean |
Numeric vector |
Computes the mean (average) of vector elements. |
Default behavior includes NA values unless na.rm = TRUE is specified. |
| |
round |
Numeric vector |
Rounds numeric values to a specified number of decimal places. |
Rounding can introduce numerical bias; use carefully when precision matters. |
| |
log |
Numeric vector |
Calculates the natural logarithm of values. |
Check input for non-positive values, as the logarithm is undefined for these. |
| Indexing |
[] |
Vector, matrix, or data frame |
Extracts elements of vectors, matrices, or data frames. |
Indices are 1-based in R. Use logical, numeric, or character indices carefully to avoid errors. |
| Subsetting |
subset |
Data frame |
Returns subsets of data frames based on conditions. |
Simpler than using [, but can be slower for large datasets. |
| Sorting |
sort |
Vector |
Sorts vector elements in ascending or descending order. |
Handles NA values by default. Can specify decreasing = TRUE for descending order. |
| |
order |
Vector |
Returns indices to sort data in ascending or descending order. |
Often used to sort data frames by multiple columns. |
| |
rank |
Vector |
Returns the ranks of elements in a vector. |
Be cautious with ties; the ties.method argument determines tie-breaking behavior. |
| Selecting Data |
which |
Logical vector |
Returns indices of elements that meet a condition. |
Works with logical conditions; helpful for subsetting data programmatically. |
| |
match |
Vector x, Vector y |
Finds matches of elements in one vector within another. |
Returns indices of matches; may return NA for unmatched elements. |
| |
%in% |
Vector x, Vector y |
Logical operator to test if elements belong to another vector. |
Easier than match for boolean results but does not return indices. |
| Writing Data |
write.csv |
Data frame |
Writes a data frame to a CSV file. |
Check path and permissions. May require row.names = FALSE to avoid writing row indices. |
| |
write.table |
Data frame or matrix |
Writes a data frame or matrix to a table file. |
Use sep to specify delimiter. Be careful with special characters in data. |
| Aggregating Data |
aggregate |
Data frame |
Splits data into groups and applies functions to summarize them. |
Useful for simple aggregation but limited for complex tasks. Grouping variables should be carefully selected. |
| |
tapply |
Vector x, Factor |
Applies a function to subsets of a vector based on a factor. |
Great for 1D aggregation; for multidimensional aggregation, consider alternatives. |
| Merging Data |
merge |
Data frame x, Data frame y |
Combines data frames by columns or rows based on shared keys. |
Specify by to avoid unexpected joins. Handles one-to-one, one-to-many, and many-to-many relationships. |
| |
cbind |
Vectors or data frames |
Combines objects by columns. |
Objects must have matching row dimensions. Risk of mismatched data if row orders differ. |
| |
rbind |
Vectors or data frames |
Combines objects by rows. |
Objects must have matching column dimensions. Missing values can cause errors. |
| Plotting |
plot |
x: Numeric vector, y: Numeric vector |
Creates a scatterplot or line plot depending on the inputs. |
Highly customizable. Use type, col, pch, and main for customization. |
| |
hist |
Numeric vector |
Creates a histogram to display the distribution of data. |
Use breaks to control bin size. Labels and axis scaling may need adjustment for clarity. |
| |
boxplot |
Formula or Numeric vectors |
Creates boxplots to display data distribution and outliers. |
Can handle grouped data with a formula interface. Use notch = TRUE for confidence intervals. |
| |
barplot |
Numeric vector or matrix |
Creates barplots for categorical data or summary statistics. |
Grouped barplots require matrix input. Customize colors and labels for better visualization. |
| |
lines |
x: Numeric vector, y: Numeric vector |
Adds connected lines to an existing plot. |
Typically used to overlay data on an existing plot. Ensure x and y lengths match. |
| |
points |
x: Numeric vector, y: Numeric vector |
Adds points to an existing plot. |
Useful for highlighting specific data points. Combine with pch and col for customization. |
| |
legend |
Character labels and positioning |
Adds a legend to an existing plot. |
Customize position and symbol appearance using pch, col, and cex. |