Recently Someone asked a question on reddit: Adding a row and a column that are means for a set matrix

This is a very simple problem that many of us learned how to do only a few hours in to basic R. But since I’ve been learning tidyverse package, I figured why not do it tidy-fashion?

Let’s give it a try.

Here is the question:

I have the following sequence:

`S<-seq(1,90, by=3)`

I make a matrix that is the following:

`matrix(S, nrow = 6, ncol = 5)`

Now I am trying to do the following:

I want to calculate the means of the columns of the matrix and add them as a new row under the columns.

Next I want to calculate the sum of the rows of the matrix and add them as a new column on the right of the matrix.

Here is what their data looks like:

```
data_matrix = matrix(seq(1, 90, by = 3), nrow = 6, ncol = 5)
data_matrix
```

```
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 19 37 55 73
## [2,] 4 22 40 58 76
## [3,] 7 25 43 61 79
## [4,] 10 28 46 64 82
## [5,] 13 31 49 67 85
## [6,] 16 34 52 70 88
```

They want to calculate the mean of each column and sum of each row.

Now if you know some basic R this can very easily be achieved:

```
data_df = data.frame(data_matrix)
data_df[,ncol(data_df)+1] = rowSums(data_df)
data_df[nrow(data_df)+1,] = colMeans(data_df)
head(data_df)
```

```
## X1 X2 X3 X4 X5 V6
## 1 1 19 37 55 73 185
## 2 4 22 40 58 76 200
## 3 7 25 43 61 79 215
## 4 10 28 46 64 82 230
## 5 13 31 49 67 85 245
## 6 16 34 52 70 88 260
```

But how can we do this “tidy” way? Turns out this is more complicated than I originally thought.

First let’s breath some context into our data so we don’t get too bored:

Let’s say we had an exam in a class of 50 students. The exam consists of 5 questions. TAs recorded the score of each question for each student on an excel sheet. The spreadsheet data has 5 columns, each for a question and 50 rows, each for a student. Now we want to calculate:

1. Total score for each student (row sums)

2. Class average for each question (col means)

Let’s first creat this “spreadsheet”

```
scores_df = data.frame(matrix(sample(1:20,250, replace = TRUE), nrow = 50, ncol = 5))
head(scores_df)
```

```
## X1 X2 X3 X4 X5
## 1 19 13 7 9 16
## 2 5 11 12 14 19
## 3 5 11 6 8 20
## 4 8 9 8 9 6
## 5 3 14 4 7 6
## 6 8 12 6 15 15
```

Here we have a data frame with 5 colmuns (5 questions) and 50 rows (50 students).

Now the next step is to for tidy data: key-value pairs.

First we need to identify the keys: in our dataset, each student-question pair uniquely identifies a value (score). So we can gather our date like this:

```
library(tidyverse)
scores_tibble = gather(rownames_to_column(scores_df, var = "student"), question, score, -student)
head(scores_tibble)
```

```
## student question score
## 1 1 X1 19
## 2 2 X1 5
## 3 3 X1 5
## 4 4 X1 8
## 5 5 X1 3
## 6 6 X1 8
```

Here I did not `unite()`

`student`

and `question`

into a single column so that later I can group by either student or question to calculate mean and sum.

Now to calculate question means:

`summarise(group_by(scores_tibble, question), mean = mean(score))`

```
## # A tibble: 5 x 2
## question mean
## <chr> <dbl>
## 1 X1 11
## 2 X2 11.1
## 3 X3 9.7
## 4 X4 9.78
## 5 X5 10.8
```

Student sum scores:

`summarise(group_by(scores_tibble, student), total = sum(score))`

```
## # A tibble: 50 x 2
## student total
## <chr> <int>
## 1 1 64
## 2 10 46
## 3 11 78
## 4 12 63
## 5 13 50
## 6 14 56
## 7 15 44
## 8 16 40
## 9 17 64
## 10 18 43
## # … with 40 more rows
```

Let’s get a single table with all information

```
scores_complete = group_by(scores_tibble, question) %>%
mutate(question_average = mean(score)) %>%
group_by(student) %>%
mutate(student_total = sum(score))
scores_complete
```

```
## # A tibble: 250 x 5
## # Groups: student [50]
## student question score question_average student_total
## <chr> <chr> <int> <dbl> <int>
## 1 1 X1 19 11 64
## 2 2 X1 5 11 61
## 3 3 X1 5 11 50
## 4 4 X1 8 11 40
## 5 5 X1 3 11 34
## 6 6 X1 8 11 56
## 7 7 X1 2 11 53
## 8 8 X1 1 11 42
## 9 9 X1 17 11 63
## 10 10 X1 8 11 46
## # … with 240 more rows
```

With a tidy table we can easily calculate stats we want and add them to the original data frame:

```
scores_df_complete = spread(scores_tibble, question, score) %>%
left_join(summarise(group_by(scores_tibble, student), student_total = sum(score)), by = "student") %>%
bind_rows(spread(summarise(group_by(scores_tibble, question), question_mean = mean(score)), question, question_mean))
head(scores_df_complete)
```

```
## student X1 X2 X3 X4 X5 student_total
## 1 1 19 13 7 9 16 64
## 2 10 8 20 4 1 13 46
## 3 11 15 19 10 18 16 78
## 4 12 6 14 13 20 10 63
## 5 13 3 6 13 19 9 50
## 6 14 10 19 10 15 2 56
```

`tail(scores_df_complete)`

```
## student X1 X2 X3 X4 X5 student_total
## 46 50 12 20.00 17.0 19.00 1.00 69
## 47 6 8 12.00 6.0 15.00 15.00 56
## 48 7 2 9.00 18.0 15.00 9.00 53
## 49 8 1 16.00 11.0 3.00 11.00 42
## 50 9 17 8.00 9.0 11.00 18.00 63
## 51 <NA> 11 11.08 9.7 9.78 10.84 NA
```