Data Analysis
Skills
Data visualization, for descriptive statistics
ggplot in R
Shiny dashboards
Stata
plotly in Python
Tableau
Codifying raw survey data
Writing meta-data documentation
Extracting, scraping, transforming, wrangling, tidying, exporting or analysing data, using R
Statistics
aggregation, with grouping
distributional analysis; density
multivariate regression
imputation
ARIMA or VAR time-series modelling
Using household survey data
- Specializing in the South African labour market
Report writing
Automation of scripts (e.g. in a virtual machine)
Using APIs for data collection, or running a bot.
Updating graphs daily
Using a High Performance Computer (HPC)
Technologies
Languages
Stata —since Aug 2017.
R —since May 2020.
Python (APIs, pandas, plotly, SciPy, scikit-learn ), since Sept 2021.
LaTeX —since 2015.
Git —since May 2022.
Bash (for using Linux CLI)
SQL —since April 2023.
Programs
Google Workspace (including Meet, Calendar, Sheets, Docs, Forms, and shared drives)
Dropbox
Office (including Word, Excel, PowerPoint, Teams, Outlook).
Excel: advanced functions, modelling, and basic VBA. I have an interest in BERT.
Texmaker + MiXTeX
Tableau
VS Code
R Studio
Stata
Notepad++
WinMerge
Portfolio
How does data availability compare across countries? The World Bank is a major source of country-level economic, financial and socioeconomic data. In this analysis, we summarise the number of data series available from the World Bank for different countries as a proxy for macroeconomic and social data availability.
Using the World Bank API, we calculate that there are 19 489 indicators available from the World Bank. To give a sense of the scale of the data available from the World Bank, the full dataset has around 236 million rows in long format.
This work was used in Rand Merchant Bank's "Where to Invest in Africa 2024" report.
My work involved assistance with:
Codifying survey responses
Stata code for graphs
https://www.opensaldru.uct.ac.za/handle/11090/1005
I produced the statistical tables and graphs in this publication.
Since 2022-10-08. I run a Python script daily.
See Previous Research and a summary of my Résumé.
Testimonials
Prof Andrew Donaldson: andrew.donaldson@uct.ac.za
Dr Daan Steenkamp: daan@codera.co.za
Mr Grant Smith: grant@gmtplus.co.za
Contact
aidan@econometrics.co.za
Example ggplot code
Here is an example of how to write a simple graph in R, using the ggplot2 package. The data would usually be pulled in from a data file, but the code below shows how ggplot can be used to create a graph. This also shows how I lay out my code. The output is shown on the right.
base_graph = list( # can be used as default settings for all graphs in the project theme( plot.title.position = "plot", axis.title.x = element_text(margin=margin(t=5)), axis.title.y = element_text(margin=margin(r=10)), axis.text.x = element_text(size=rel(1.1), margin=margin(b=2)), panel.grid.major = element_blank(), panel.grid.minor = element_blank() ), scale_x_continuous( breaks = pretty_breaks(), expand = expansion(mult=c(0.015, 0.14)), labels = scales::percent_format(accuracy = 1) ))
employment_graph <- ggplot( data = data.frame( age = factor(seq(1, 5), seq(1, 5), c( "18-29", "30-39", "40-49", "50-59", "60-65" ) ), prop = c(0.63, 0.78, 0.90, 0.82, 0.68) ) %>% as_tibble() %>% mutate( percentage = paste0(round(prop*100), '%') ) , aes( x = prop, y = age )) + theme_minimal() + base_graph + geom_bar( stat = "identity", fill = "darkblue" ) + coord_flip() + geom_text( aes( x = prop*0.89 - 0.04, label = percentage, y = age ), stat = "identity", size = rel(3.5), check_overlap=T, color = "white" ) + labs( x = "Proportion", y = "Age", title = "Employment rate by age" )
employment_graph
png( filename="employment.png", width = 600, height = 500, res = 150)employment_graphdev.off()