Rebecca Youngerman
Data Scientist, Public Health Enthusiast, Looking to Change the World
Graduate School Work
​
COVID-19 Work
In Spring 2020 the world took a sharp turn to focus on an unprecedented challenge: COVID-19. As a graduate student in Health Data Science, my professors and I saw the importance of analyzing this pandemic in real-time. Three of my courses: Statistical Learning, Data Science II, and Computing Foundations led to final projects looking to answer various questions about Coronavirus:
1) How does socio-economic status interplay with the spread of COVID-19 in New York City?
​
​
​
​
​
​
​
​
​

2) Can we use lung CT scans to predict COVID-19 as an alternative to potentially inaccessible testing?

3) How can parallelization improve the efficiency of simulating the spread of a disease like COVID-19?

Health Financing Analysis
Using data from the Gapminder Foundation I created an interactive app with R shiny. The analysis explores the variance in health outcomes and financing around the world through interactive maps and country-by-country comparisons. Have fun exploring!
​
See the full, collaborative project here: https://rebeccayoungerman.wixsite.com/healthandwealth
Academic Posters​


Undergraduate Work​
Honors Thesis
This paper details my self-designed project through AidData and GeoQuery working within a mission to create specialized GeoQuery platforms addressing the United Nation's Sustainable Development Goals. This project embarked on analyzing and publicizing sub-national data focused on SDG 3.C (found here) in Kenya.
Policy Brief
​
For the culmination of Kinesiology 460, Health Policy, taught by Professor Carrie Dolan, PhD at the College of William and Mary, student were tasked with creating a 500-800 word policy story that incorporates key aspects of data journalism. The story needed to adopt an evidence-based approach to critically analyze the following question: "What health issue should be addressed to reach sustainable development goal number 3?" (see goal description here).
Child Deprivation Index
​
This report addresses an issue posed by Save the Children. The organization wants to create a Child Deprivation Index that can be standardized across countries to aid in decision-making about where to do direct programming, with a focus on the most deprived children and groups. Save the Children proposes to use a 3x3 index based on under 5 mortality rate, years of schooling completed (or similar education metric), rate or stunting among under 5s, and location (urban/rural), gender, and household wealth.
Dam Prioritization
​
This report examines the question: "Given expert practitioner preferences, and accounting for uncertainty in your source data, what 10 dams would you recommend for a priority investment by the US Army Corps of Engineers?"
In order to answer this question, a predictive model of risk based on past dam failures needed to be created, selecting the best model for prediction from a suite of potential models. Then, a mixed qualitative-quantitative model was created using the Analytic Hierarchy Process to select, of the top dams most likely to fail, which should receive priority investment.
College Loan Default Rate
​
This report details a model that will predict the probability of student loan default for US academic institutions. The top-5 best schools a student should go to if they want to minimize the likelihood of defaulting on student loans post-graduation based on the student's stated preferences was then provided based on the model. Web scraping, data joining using a fuzzy matching procedure, and Monte Carlo cross-validation were all used to create a final recommendation of the five best and five worst universities.
Predicting Syrian Conflict
​
This report focuses on integrating disparate data sources in order to fit a model predicting conflict in Syria. The analysis was done at two geographic scales using different census units in order to contrast results and obtain a more accurate model. Web scrapped data from Google Maps was combined with conflict data to create models. Parallelization was necessary within the simulation code was used to ensure a solution could be found in the time allotted.
Geospatial Analysis:
Projects in ArcGIS
.png)
.png)
.png)
