Portfolio | Ondrej Hoberla

The Covax Project

The CoVax Project

Covid-19 Vaccination Uptake in the Czech Republic

An end-to-end data analysis project that sources, cleans, and visualises Covid-19 vaccination rates across the Czech Republic, presented in a custom-built, interactive web report.

Presentation

Choropleth map of Covid-19 vaccination uptake in the Czech Republic

Overview

The CoVax Project is a data-driven report that investigates the regional disparities in Covid-19 vaccination uptake across the Czech Republic. At the time of the project's inception, official government dashboards provided extensive data but lacked a clear, direct visualisation of the ratio of fully immunised residents to the total population per region. This project addresses that gap by answering a key question: Is vaccine uptake homogeneous across the country, or are there significant regional differences?

The project follows the complete data analysis pipeline: sourcing raw data from the Czech Open Data Repository, performing extensive data cleaning and wrangling in R, and merging healthcare data with geospatial coordinates. The final output is a choropleth map that clearly visualises the findings, which is presented on a custom-built, interactive website. The analysis revealed a clear heterogeneity in vaccination rates, with a roughly 10% difference between the highest and lowest regions, correlating with socio-economic and political factors discussed in the report.

Key Features

Automated Data Sourcing: Acquires the latest immunisation data directly from the Czech Ministry of Health's public CSV endpoint.
Data Wrangling & Transformation: Cleans, aggregates, and merges data from multiple sources (immunisation records, population statistics, and geospatial data) using R and the tidyverse.
Geospatial Visualisation: Generates a high-quality choropleth map using ggplot2 and sf to display vaccination ratios by region.
Custom Interactive Front-End: Presents the entire report on a single-page website application with a clean UI, featuring a custom-built, multi-level tabbed navigation.
Responsive Design: The web report is responsive and provides an optimal viewing experience on desktops, tablets, and mobile devices.

Skills Employed

Data Sourcing and Cleaning
Data Wrangling and Transformation
R Programming
Geospatial Data Matching and Visualisation
Static and Interactive Data Visualisation
Front-End Web Development

Project Area

This project is linked to Data Science, Public Health, and Information Visualisation, and elements of Social Science the interpretation of the results.

Solution

Languages: R, HTML5, CSS3, JavaScript
Data Analysis & Visualisation (R/Studio)
- Data Wrangling: tidyverse (dplyr, readr)
- Geospatial: sf, RCzechia
- Plotting: ggplot2
Web Design
- Frameworks: Bootstrap 3
- Libraries: jQuery, highlight.js
- Deployment: GitHub

Learning Challenges

The primary challenge was transitioning the project from a static, auto-generated R Markdown document into a fully custom and interactive web application. This involved manually separating the HTML, CSS, and JavaScript, refactoring legacy code, and rebuilding the interactive tab functionality from scratch using Bootstrap's native components. This was a valuable learning experience in front-end development.

The CoVax Project Page

The Multiverse Project

The Multiverse Project

Researcher Degrees of Freedom in Data Analysis

An end-to-end computational statistics project that uses multiverse analysis to explore how analytical choices impact research outcomes, presented in a custom-built, interactive web report.

Presentation

Specification curve plot showing the stability of the main finding after correcting for overfitting

Overview

The Multiverse Project investigates the "garden of forking paths" in research—the idea that the countless small, justifiable decisions a researcher makes can lead to dramatically different conclusions. It addresses this by re-analyzing the famous "Many Analysts, One Dataset" study, which explored the link between football players' skin tone and their likelihood of receiving red cards.

Instead of choosing one "correct" analysis, this project uses a multiverse approach to run every possible combination of a set of 10 covariates, resulting in 1,023 unique statistical models. The analysis systematically detects and corrects for overfitting and examines the results for consistency. The project's key finding is a crucial one for modern science: a result can be **consistently statistically significant** across many analyses but still have very **low practical explanatory power**, urging a more cautious interpretation of research that focuses only on p-values.

Key Features

Multiverse Construction: Programmatically generates and executes 1,023 unique multilevel logistic regression models to map the entire space of possible outcomes.
Focus on Model Performance: Goes beyond traditional multiverse analyses by systematically evaluating overall model performance (marginal R²) to assess the practical meaning of the findings.
Systematic Overfitting Detection: Identifies and removes covariates that artificially inflate model performance, leading to a more stable and reliable set of final results.
Custom Interactive Front-End: Presents the entire analysis, methodology, and visualizations on a single-page website with a clean UI and custom-built, multi-level tabbed navigation.
Accessible Explanations: Breaks down complex statistical concepts like multilevel modeling, optimizers, and error handling for a broad audience.

Skills Employed

Advanced Statistical Modeling (Multilevel Logistic Regression)
Computational Statistics & Reproducible Research
R Programming (lme4, tidyverse)
Data Wrangling and Transformation
Static and Interactive Data Visualisation
Front-End Web Development

Project Area

This project is linked to Computational Psychology, Data Science, Statistics, and the Open Science movement.

Solution

Languages: R, Python, HTML5, CSS3, JavaScript
Data Analysis & Visualisation (R/Studio)
- Modeling: lme4, broom.mixed
- Data Wrangling: tidyverse (dplyr, tidyr)
- Plotting: ggplot2
Web Design
- Frameworks: Bootstrap 3
- Libraries: jQuery, highlight.js
- Deployment: GitHub Pages

Learning Challenges

The primary challenge was managing the computational and conceptual complexity of the analysis. Running over a thousand complex regression models required careful optimization of the code and robust error handling to ensure the process could run to completion. Conceptually, interpreting the landscape of results—rather than a single outcome—demanded a shift in thinking from a simple "significant/not significant" mindset to a more nuanced assessment of the stability, consistency, and practical meaning of the findings.

The Multiverse Project Page

Google Analytics

The Google Analytics Project

Coming soon!

Welcome to my portfolio!

The CoVax Project

Covid-19 Vaccination Uptake in the Czech Republic

Presentation

Overview

Key Features

Skills Employed

Project Area

Solution

Learning Challenges

The Multiverse Project

Researcher Degrees of Freedom in Data Analysis

Presentation

Overview

Key Features

Skills Employed

Project Area

Solution

Learning Challenges

The Google Analytics Project