Heatmap is a widely used method for the visualization of gene expression data. I make heatmap for visualization of log2FC data from multiple samples. This time I decided to write a notebook by breaking down every steps so that it become easier for me to modify this code to produce desired heatmap next time when I need. The code is taken from the ComplexHeatmap documentation and I added the explanation for each step.

First we need to load all the packages we are going to use.

library(ComplexHeatmap)
## Loading required package: grid
## ========================================
## ComplexHeatmap version 2.2.0
## Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
## Github page: https://github.com/jokergoo/ComplexHeatmap
## Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
## 
## If you use it in published research, please cite:
## Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional 
##   genomic data. Bioinformatics 2016.
## ========================================
library(circlize)
## ========================================
## circlize version 0.4.8
## CRAN page: https://cran.r-project.org/package=circlize
## Github page: https://github.com/jokergoo/circlize
## Documentation: http://jokergoo.github.io/circlize_book/book/
## 
## If you use it in published research, please cite:
## Gu, Z. circlize implements and enhances circular visualization 
##   in R. Bioinformatics 2014.
## ========================================

Loading and Pre-processing of data

Load gene expression data for clustering. If you have your own data you read that by using read.csv or read.table function and comvert to a matrix.

expr = readRDS(system.file(package = "ComplexHeatmap", "extdata", "gene_expression.rds"))

Select column with expression value and convert them to a matrix because data need to be in matrix format to use as input for Heatmap function.

mat = as.matrix(expr[, grep("cell", colnames(expr))])

In order to get a nice cluster we need to scale data using scale function of R. Scale function center the data and convert it to z-score. In code bellow we apply scale function along the column of the matrix and transpose it. If you have log2FC data you do not need to do this part.

mat_scaled = t(apply(mat, 1, scale))

We generated a vector of row means to annotate the heatmap with raw mean value of each genes.

base_mean = rowMeans(mat)

Here we are using regular expression to extract cell type from the column names.

type = gsub("s\\d+_", "", colnames(mat))

Generate heatmap

Now we are going to gradually build the complex heatmap. At first lets just took at the heatmap of our scalled expression data. Instead of scalled expression data you can use your log2FC data. You read your log2FC data in R using read.table function then convert it to a matrix.

Heatmap(mat_scaled, name = "expression")

Now we are going to change the color of the heatmap by adding values to col option, here we give two vectors as input to colorRamp2 function, one vector for break value of our gene expression data (Lower point, middle upper point) and another vector of color that correspond to the break value.

colors = colorRamp2(c(-2, 0, 2), c("green", "white", "red"))
Heatmap(mat_scaled, name = "expression", col = colors)

In the above heatmap we can not see the row names clearly, so I would like to remove the reowname by setting the option show_row_names to FALSE. There are a lot of option like this that take bolean values.

Heatmap(mat_scaled, name = "Expression", col = colors, row_title = NULL, show_row_dend = FALSE, show_row_names = FALSE)

Annotate samples

Now we would like to add some annotations to the heatmap, let add annotation for columns in the heatmap. In this heatmap columns are cells, there are three types of cells in this data. A vector object named ‘type’ that we defined earilier is a vector of cell type for each column. The length of vector is equal to the number of columns or samples. With the code below we will make a HeatmapAnnotation object ha then use it to annotate the column of heatmap. We decided to put column annotation on the top of the heatmap using top_annotation option. If you have different groups of samples, you can use group information of your samples to annotate your heatmap.

ha = HeatmapAnnotation(type = type, annotation_name_side = "left")
Heatmap(mat_scaled, name = "Expression", col = colors, row_title = NULL, show_row_dend = FALSE, show_row_names = FALSE, top_annotation = ha)

We can see that three color bar at the top of the heatmap represents three different cell types and ligend for the bar color is at the left side of the heatmap.

Breaking heatmap into clusters

Now lets break the heatmap into five different clusters, by setting us row_km to 5.

Heatmap(mat_scaled, name = "Expression", col = colors, row_title = NULL, show_row_dend = FALSE, show_row_names = FALSE, top_annotation = ha, row_km = 5)

Annotate genes or rows

We can add another heatmap that represent mean value of gene’s expression acrros the row of our data. Earlier we made a vector with the mean value of the row.

Heatmap(mat_scaled, name = "Expression", col = colors, row_title = NULL, show_row_dend = FALSE, show_row_names = FALSE, top_annotation = ha, row_km = 5) +
Heatmap(base_mean, name = "base mean", width = unit(15, "mm"))

The one column heatmap at the side represent the mean value of the gene expression value. Now if we would like to add boxplot for the mean value of all five clusters at the top of the annotation bar.

ha_mean = HeatmapAnnotation(summary = anno_summary(gp = gpar(fill = 2:6), height = unit(2, "cm")))
Heatmap(mat_scaled, name = "Expression", col = colors, row_title = NULL, show_row_dend = FALSE, show_row_names = FALSE, top_annotation = ha, row_km = 5) +
Heatmap(base_mean, name = "base mean", top_annotation = ha_mean, width = unit(15, "mm"))

We have type of gene in our gene expression matrix, which we can use to make another heatmap beside the base mean heatmap to add gene type information for every row.

Heatmap(mat_scaled, name = "Expression", col = colors, row_title = NULL, show_row_dend = FALSE, show_row_names = FALSE, top_annotation = ha, row_km = 5) +
Heatmap(base_mean, name = "base mean", top_annotation = ha_mean, width = unit(15, "mm")) +
Heatmap(expr$type, name = "gene type")