list_1 <- list(10, 30, 50)Introduction
Data structures in R are tools for storing and organizing multiple values.
They help to organize stored data in a way that the data can be used more effectively. Data structures vary according to the number of dimensions and the data types (heterogeneous or homogeneous) contained. The primary data structures are:
Vectors (link)
Lists
Data frames
Matrices
Arrays
Factors
Data structures
1. Vectors
Discussed in a previous post
2. Lists
Lists are objects/containers that hold elements of the same or different types. They can containing strings, numbers, vectors, matrices, functions, or other lists. Lists are created with the list() function
Examples
a. Three element list
b. Single element list
list_2 <- list(c(10, 30, 50))c. Three element list
list_3 <- list(1:3, c(50,40), 3:-5)d. List with elements of different types
list_4 <- list(c("a", "b", "c"), 5:-1)e. List which contains a list
list_5 <- list(c("a", "b", "c"), 5:-1, list_1)f. Set names for the list elements
names(list_5)NULL
names(list_5) <- c("character vector", "numeric vector", "list")
names(list_5)[1] "character vector" "numeric vector" "list"
g. Access elements
list_5[[1]][1] "a" "b" "c"
list_5[["character vector"]][1] "a" "b" "c"
h. Length of list
length(list_1)[1] 3
length(list_5)[1] 3
3. Data frames
A data frame is one of the most common data objects used to store tabular data in R. Tabular data has rows representing observations and columns representing variables. Dataframes contain lists of equal-length vectors. Each column holds a different type of data, but within each column, the elements must be of the same type. The most common data frame characteristics are listed below:
• Columns should have a name;
• Row names should be unique;
• Various data can be stored (such as numeric, factor, and character);
• The individual columns should contain the same number of data items.
Creation of data frames
level <- c("Low", "Mid", "High")
language <- c("R", "RStudio", "Shiny")
age <- c(25, 36, 47)
df_1 <- data.frame(level, language, age)Functions used to manipulate data frames
a. Number of rows
nrow(df_1)[1] 3
b. Number of columns
ncol(df_1)[1] 3
c. Dimensions
dim(df_1)[1] 3 3
d. Class of data frame
class(df_1)[1] "data.frame"
e. Column names
colnames(df_1)[1] "level" "language" "age"
f. Row names
rownames(df_1)[1] "1" "2" "3"
g. Top and bottom values
head(df_1, n=2) level language age
1 Low R 25
2 Mid RStudio 36
tail(df_1, n=2) level language age
2 Mid RStudio 36
3 High Shiny 47
h. Access columns
df_1$level[1] "Low" "Mid" "High"
i. Access individual elements
df_1[3,2][1] "Shiny"
df_1[2, 1:2] level language
2 Mid RStudio
j. Access columns with index
df_1[, 3][1] 25 36 47
df_1[, c("language")][1] "R" "RStudio" "Shiny"
k. Access rows with index
df_1[2, ] level language age
2 Mid RStudio 36
4. Matrices
A matrix is a rectangular two-dimensional (2D) homogeneous data set containing rows and columns. It contains real numbers that are arranged in a fixed number of rows and columns. Matrices are generally used for various mathematical and statistical applications.
a. Creation of matrices
m1 <- matrix(1:9, nrow = 3, ncol = 3)
m2 <- matrix(21:29, nrow = 3, ncol = 3)
m3 <- matrix(1:12, nrow = 2, ncol = 6)b. Obtain the dimensions of the matrices
# m1
nrow(m1)[1] 3
ncol(m1)[1] 3
dim(m1)[1] 3 3
# m3
nrow(m3)[1] 2
ncol(m3)[1] 6
dim(m3)[1] 2 6
c. Arithmetic with matrices
m1+m2 [,1] [,2] [,3]
[1,] 22 28 34
[2,] 24 30 36
[3,] 26 32 38
m1-m2 [,1] [,2] [,3]
[1,] -20 -20 -20
[2,] -20 -20 -20
[3,] -20 -20 -20
m1*m2 [,1] [,2] [,3]
[1,] 21 96 189
[2,] 44 125 224
[3,] 69 156 261
m1/m2 [,1] [,2] [,3]
[1,] 0.04761905 0.1666667 0.2592593
[2,] 0.09090909 0.2000000 0.2857143
[3,] 0.13043478 0.2307692 0.3103448
m1 == m2 [,1] [,2] [,3]
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,] FALSE FALSE FALSE
d. Matrix multiplication
m5 <- matrix(1:10, nrow = 5)
m6 <- matrix(43:34, nrow = 5)
m5*m6 [,1] [,2]
[1,] 43 228
[2,] 84 259
[3,] 123 288
[4,] 160 315
[5,] 195 340
# m5%*%m6 will not work because of the dimesions.
# the vector m6 needs to be transposed.
# Transpose
m5%*%t(m6) [,1] [,2] [,3] [,4] [,5]
[1,] 271 264 257 250 243
[2,] 352 343 334 325 316
[3,] 433 422 411 400 389
[4,] 514 501 488 475 462
[5,] 595 580 565 550 535
e. Generate an identity matrix
diag(5) [,1] [,2] [,3] [,4] [,5]
[1,] 1 0 0 0 0
[2,] 0 1 0 0 0
[3,] 0 0 1 0 0
[4,] 0 0 0 1 0
[5,] 0 0 0 0 1
f. Column and row names
colnames(m5)NULL
rownames(m6)NULL
5. Arrays
An array is a multidimensional vector that stores homogeneous data. It can be thought of as a stacked matrix and stores data in more than 2 dimensions (n-dimensional). An array is composed of rows by columns by dimensions. Example: an array with dimensions, dim = c(2,3,3), has 2 rows, 3 columns, and 3 matrices.
a. Creating arrays
arr_1 <- array(1:12, dim = c(2,3,2))
arr_1, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
b. Filter array by index
arr_1[1, , ] [,1] [,2]
[1,] 1 7
[2,] 3 9
[3,] 5 11
arr_1[1, ,1][1] 1 3 5
arr_1[, , 1] [,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
6. Factors
Factors are used to store integers or strings which are categorical. They categorize data and store the data in different levels. This form of data storage is useful for statistical modeling. Examples include TRUE or FALSE and male or female.
vector <- c("Male", "Female")
factor_1 <- factor(vector)
factor_1[1] Male Female
Levels: Female Male
OR
factor_2 <- as.factor(vector)
factor_2[1] Male Female
Levels: Female Male
as.numeric(factor_2)[1] 2 1