Data analysis result visuzlization for spatial transcriptomics, bulk RNA-Seq and scRNA-Seq. Credits: Inés Rivero García and Miren Urrutia Iturritza

Get your stats together!

Lately I have encountered the (exciting) challenge of having to understand and apply more difficult concepts in statistics for the Molecular Techniques in Life Science (MTLS) second year courses. Statistics can be difficult and incomprehensible at times, but that should never stop you from digging into them 😀 After many, many hours looking in the endless Google universe for tips, tricks and understandable explanations I have found some resources which really helped. Because sharing is caring, I have created this post with 5 very useful materials for when you need to get your stats together!


Images of (from left to right) spatial transcriptomics data analysis, RNA-Seq data normalization and single cell RNA-Seq clustering. Credits: Inés Rivero and Miren Urrutia.


1 – StatQuest

StatQuest is a youtube channel created by Josh Starmer, assistant professor at UNC – Chapel Hill. His videos explain concepts in statistics and machine learning for biologists and include both theoretical explanations and practical examples on how to use them in the daily lab work.

You can check the videos here. My favourite is the one about PCA (super easy to understand!)


Screenshot from the StatQuest: Principal Component Analysis (PCA), Step by step. Credits: Josh Starmer.


2 – Statistical Inference for data science, by Brian Caffo

If you feel you need something more similar to a “real-life” lecture, you can try this Coursera course: Statistical Inference. Taught by Brian Caffo, a biostatistics Professor at John Hopkins University, the course is easy to follow and full of examples.

There is also a book based on this course. The book and other very useful materials can be found at Brian Caffo’s github page.

And if you feel like taking the whole Coursera course is too much you can always watch the specific lecture you’re interested in in his yotube channel.

If Caffo’s course has opened your statistical appetite I have good news for you! There is a whole series of data science courses by John Hopkins University that you can do in Coursera. It is a very complete series, which goes from the very basics to reproducible research, statistical inference and regression models and even machine learning!


3 – Harvard X Biomedical Data Science Open Online Training

The Harvard X Biomedical Data Science Open Online Training is a series of three MOOCs created by Rafael Irizarry, JP Onnela, Vince Carey, Mike Love and Shirley Liu. It contains three courses:

  • Data Analysis for the Life Sciences
  • Genomics Data Analysis
  • Using Python for Research

I have tried the Data Analysis for the Life Sciences course, which is also available in EdX. It contains basic training about biostatistics and what I love about it is that it allows the students to do real-world data analysis using the programming language R. It is very straightforward to follow, with lots of examples, and many solved codes for when you are stuck with the programming part of statistics.

The three courses can be accesses though Irizarry’s github webpage. Each one has a very detailed index about all the topics covered, which makes it very easy to find just what you are looking for without having to complete the whole couse. There are book chapters which can be freely read online, youtube videos and links to the EdX webpage for each of the course sections.


This is how the beggining of the Data Analysis for the Life Sciences series looks like. Source:

4 – “Points of Significance” from Nature Methods

Points of Significance is a monthly column written in Nature Methods in which essential concepts in statistics and experimental designs for biologists. The articles are 1 or 2 pages long, and give a high-level overview of a statistical method in the context of biological research.


Capture of the “Points of Significance” webpage by Nature Methods showing some of the topics covered in the articles. Source:

5 – Cross validated

Cross Validated is an oline forum in which people ask their questions about statistics and the online community replies to them. Although you have to always be critical when reading this type of information, most of the answers are high-quality and very reliable. However, these resources are great for solving very specific questions that are blocking you from finishing that Master’s Thesis data analysis. Also, the chances that other people have had the same questions you have and dared to ask them online are quite high! So it just takes a few minutes to get the answer to that question that is keeping you awake at night 😉


Do you know other useful resources to get your stats together? Let us know in the comments 🙂

For any questions about life at KI or the Molecular Techniques in Life Science drop me an email at



LinkedIn: Inés Rivero García



Related posts