The CoVax Project
Covid-19 Vaccination Uptake in the Czech Republic
An end-to-end data analysis project that sources, cleans, and visualises Covid-19 vaccination rates across the Czech Republic, presented in a custom-built, interactive web report.
Presentation
Overview
The CoVax Project is a data-driven report that investigates the regional disparities in Covid-19 vaccination uptake across the Czech Republic. At the time of the project's inception, official government dashboards provided extensive data but lacked a clear, direct visualisation of the ratio of fully immunised residents to the total population per region. This project addresses that gap by answering a key question: Is vaccine uptake homogeneous across the country, or are there significant regional differences?
The project follows the complete data analysis pipeline: sourcing raw data from the Czech Open Data Repository, performing extensive data cleaning and wrangling in R, and merging healthcare data with geospatial coordinates. The final output is a choropleth map that clearly visualises the findings, which is presented on a custom-built, interactive website. The analysis revealed a clear heterogeneity in vaccination rates, with a roughly 10% difference between the highest and lowest regions, correlating with socio-economic and political factors discussed in the report.
Key Features
- Automated Data Sourcing: Acquires the latest immunisation data directly from the Czech Ministry of Health's public CSV endpoint.
- Data Wrangling & Transformation: Cleans, aggregates, and merges data from multiple sources (immunisation records, population statistics, and geospatial data) using R and the tidyverse.
- Geospatial Visualisation: Generates a high-quality choropleth map using ggplot2 and sf to display vaccination ratios by region.
- Custom Interactive Front-End: Presents the entire report on a single-page website application with a clean UI, featuring a custom-built, multi-level tabbed navigation.
- Responsive Design: The web report is responsive and provides an optimal viewing experience on desktops, tablets, and mobile devices.
Skills Employed
- Data Sourcing and Cleaning
- Data Wrangling and Transformation
- R Programming
- Geospatial Data Matching and Visualisation
- Static and Interactive Data Visualisation
- Front-End Web Development
Project Area
This project is linked to Data Science, Public Health, and Information Visualisation, and elements of Social Science the interpretation of the results.
Solution
- Languages: R, HTML5, CSS3, JavaScript
-
Data Analysis & Visualisation (R/Studio)
- Data Wrangling: tidyverse (dplyr, readr)
- Geospatial: sf, RCzechia
- Plotting: ggplot2
-
Web Design
- Frameworks: Bootstrap 3
- Libraries: jQuery, highlight.js
- Deployment: GitHub
Learning Challenges
The primary challenge was transitioning the project from a static, auto-generated R Markdown document into a fully custom and interactive web application. This involved manually separating the HTML, CSS, and JavaScript, refactoring legacy code, and rebuilding the interactive tab functionality from scratch using Bootstrap's native components. This was a valuable learning experience in front-end development.
The Multiverse Project
Researcher Degrees of Freedom in Data Analysis
An end-to-end computational statistics project that uses multiverse analysis to explore how analytical choices impact research outcomes, presented in a custom-built, interactive web report.
Presentation
Overview
The Multiverse Project investigates the "garden of forking paths" in research—the idea that the countless small, justifiable decisions a researcher makes can lead to dramatically different conclusions. It addresses this by re-analyzing the famous "Many Analysts, One Dataset" study, which explored the link between football players' skin tone and their likelihood of receiving red cards.
Instead of choosing one "correct" analysis, this project uses a multiverse approach to run every possible combination of a set of 10 covariates, resulting in 1,023 unique statistical models. The analysis systematically detects and corrects for overfitting and examines the results for consistency. The project's key finding is a crucial one for modern science: a result can be **consistently statistically significant** across many analyses but still have very **low practical explanatory power**, urging a more cautious interpretation of research that focuses only on p-values.
Key Features
- Multiverse Construction: Programmatically generates and executes 1,023 unique multilevel logistic regression models to map the entire space of possible outcomes.
- Focus on Model Performance: Goes beyond traditional multiverse analyses by systematically evaluating overall model performance (marginal R²) to assess the practical meaning of the findings.
- Systematic Overfitting Detection: Identifies and removes covariates that artificially inflate model performance, leading to a more stable and reliable set of final results.
- Custom Interactive Front-End: Presents the entire analysis, methodology, and visualizations on a single-page website with a clean UI and custom-built, multi-level tabbed navigation.
- Accessible Explanations: Breaks down complex statistical concepts like multilevel modeling, optimizers, and error handling for a broad audience.
Skills Employed
- Advanced Statistical Modeling (Multilevel Logistic Regression)
- Computational Statistics & Reproducible Research
- R Programming (lme4, tidyverse)
- Data Wrangling and Transformation
- Static and Interactive Data Visualisation
- Front-End Web Development
Project Area
This project is linked to Computational Psychology, Data Science, Statistics, and the Open Science movement.
Solution
- Languages: R, Python, HTML5, CSS3, JavaScript
-
Data Analysis & Visualisation (R/Studio)
- Modeling: lme4, broom.mixed
- Data Wrangling: tidyverse (dplyr, tidyr)
- Plotting: ggplot2
-
Web Design
- Frameworks: Bootstrap 3
- Libraries: jQuery, highlight.js
- Deployment: GitHub Pages
Learning Challenges
The primary challenge was managing the computational and conceptual complexity of the analysis. Running over a thousand complex regression models required careful optimization of the code and robust error handling to ensure the process could run to completion. Conceptually, interpreting the landscape of results—rather than a single outcome—demanded a shift in thinking from a simple "significant/not significant" mindset to a more nuanced assessment of the stability, consistency, and practical meaning of the findings.
The Google Analytics Project
Coming soon!