Teachers in Regional Education (2010–2024)
Project: A longitudinal study investigating teacher economics, gender distribution (Pay Gap), and professional qualification levels across 14 Czech regions during a period of significant systemic change.
Implementation: Engineered an automated pipeline using JSON sitemaps and web scraping to collate fragmented public data. Utilised Power Query (M Language) to unify disparate records and resolve methodological shifts occurring in 2011 and 2016.
Outcomes: Implementation: A unified, reproducible longitudinal dataset and a multi-wave ETL pipeline. Communication: A self-service Tableau dashboard featuring interactive tooltips and pop-up context to mitigate interpretation biases like the Ecological Fallacy and Area Bias.
Skills & Tools
- Automated ETL, Webscraping, Tableau Public, Power Query (M), JSON Mapping.
Multiverse Analysis of Big Data
Project: An investigation into the "reproducibility crisis" by evaluating how varied data-processing decisions (thousands of potential paths) impact the robustness of scientific findings.
Implementation: Scripted high-performance statistical simulations in R and Python to implement Multilevel Mixed-Effects Modelling across a massive multiverse of data processing choices.
Outcomes:
Implementation: Technical implementation of the multiverse analysis concept for multilevel mixed-effects models.
Communication: A detailed research report and visualisation suite (ggplot2) outlining the strengths and boundary limitations of specific analytical approaches.
This project was awarded the 2022 Prof Michael Siegal Prize by the Department of Psychology, University of Sheffield, for the best final project.
Skills & Tools
- R, Python, Advanced Multivariate Statistics, Multilevel Modelling, Reproducible Research.
The CoVax Project
Project: A geospatial analysis exploring heterogeneity in COVID-19 vaccination uptake across Czech administrative regions relative to population density and socio-economic markers.
Implementation: Developed an automated R (Tidyverse) pipeline to clean public health records and integrated geospatial layers via the sf package for regional mapping.
Outcomes: Implementation: An automated tidying and merging pipeline for public health data. Communication: A comprehensive visualisation showing regional differences in vaccine uptake, providing critical insights into geographical inequities for policy stakeholders.
Skills & Tools
- Geospatial Analysis (sf), R (Tidyverse), ggplot2, Markdown, Data Wrangling.