How are jobs changing?

Data Analysis

Skills

Data visualization, for descriptive statistics
- ggplot in R
- Shiny dashboards
- Stata
- plotly in Python
- Tableau
Codifying raw survey data
Writing meta-data documentation
Extracting, scraping, transforming, wrangling, tidying, exporting or analysing data, using R
Statistics
- aggregation, with grouping
- distributional analysis; density
- multivariate regression
- imputation
- ARIMA or VAR time-series modelling
Using household survey data
- Specializing in the South African labour market
Report writing
Automation of scripts (e.g. in a virtual machine)
- Using APIs for data collection, or running a bot.
- Updating graphs daily
Using a High Performance Computer (HPC)

Technologies

Languages

Stata —since Aug 2017.
R —since May 2020.
Python (APIs, pandas, plotly, SciPy, scikit-learn ), since Sept 2021.
LaTeX —since 2015.
Git —since May 2022.
Bash (for using Linux CLI)
SQL —since April 2023.

Programs

Google Workspace (including Meet, Calendar, Sheets, Docs, Forms, and shared drives)
Dropbox
Office (including Word, Excel, PowerPoint, Teams, Outlook).
- Excel: advanced functions, modelling, and basic VBA. I have an interest in BERT.
Texmaker + MiXTeX
Tableau
VS Code
R Studio
Stata
Notepad++
WinMerge

Portfolio

Measuring data availability by country

How does data availability compare across countries? The World Bank is a major source of country-level economic, financial and socioeconomic data. In this analysis, we summarise the number of data series available from the World Bank for different countries as a proxy for macroeconomic and social data availability.

Using the World Bank API, we calculate that there are 19 489 indicators available from the World Bank. To give a sense of the scale of the data available from the World Bank, the full dataset has around 236 million rows in long format.

This work was used in Rand Merchant Bank's "Where to Invest in Africa 2024" report.

Survey of businesses in a shopping precinct (GMT+)

https://covid19economicideas.org/2020/04/20/the-impact-of-lockdown-on-employment-at-the-va-waterfront/

My work involved assistance with:

Codifying survey responses
Stata code for graphs

Employment and earnings by industry before Covid-19

https://www.opensaldru.uct.ac.za/handle/11090/1005

I produced the statistical tables and graphs in this publication.

Scraping the Bing Wallpaper, daily

Since 2022-10-08. I run a Python script daily.

See Previous Research and a summary of my Résumé.

Testimonials

Prof Andrew Donaldson: andrew.donaldson@uct.ac.za

Dr Daan Steenkamp: daan@codera.co.za

Mr Grant Smith: grant@gmtplus.co.za

Contact

aidan@econometrics.co.za

https://calendly.com/aidan-horn/

Example ggplot code

Here is an example of how to write a simple graph in R, using the ggplot2 package. The data would usually be pulled in from a data file, but the code below shows how ggplot can be used to create a graph. This also shows how I lay out my code. The output is shown on the right.

library('ggplot2')library('scales')library('tidyverse')

base_graph = list( # can be used as default settings for all graphs in the project   theme(      plot.title.position = "plot",      axis.title.x = element_text(margin=margin(t=5)),      axis.title.y = element_text(margin=margin(r=10)),      axis.text.x = element_text(size=rel(1.1), margin=margin(b=2)),      panel.grid.major = element_blank(),      panel.grid.minor = element_blank()   ),   scale_x_continuous(      breaks = pretty_breaks(),      expand = expansion(mult=c(0.015, 0.14)),      labels = scales::percent_format(accuracy = 1)   ))
employment_graph <- ggplot(   data = data.frame(         age = factor(seq(1, 5),            seq(1, 5),            c(               "18-29", "30-39", "40-49", "50-59", "60-65"            )         ),         prop = c(0.63, 0.78, 0.90, 0.82, 0.68)      ) %>% as_tibble() %>%      mutate(         percentage = paste0(round(prop*100), '%')      )   ,   aes(      x = prop,      y = age   )) +   theme_minimal() + base_graph +   geom_bar(      stat = "identity",      fill = "darkblue"   ) +   coord_flip() +   geom_text(      aes(         x = prop*0.89 - 0.04,         label = percentage,         y = age      ),      stat = "identity",      size = rel(3.5),      check_overlap=T,      color = "white"   ) +   labs(      x = "Proportion",      y = "Age",      title = "Employment rate by age"   )
employment_graph

png(   filename="employment.png",   width = 600, height = 500,   res = 150)employment_graphdev.off()

Page updated

Google Sites

Report abuse