Data Analysis

Skills

    • Specializing in the South African labour market

Technologies

Languages

Programs

Portfolio

How does data availability compare across countries? The World Bank is a major source of country-level economic, financial and socioeconomic data. In this analysis, we summarise the number of data series available from the World Bank for different countries as a proxy for macroeconomic and social data availability.

Using the World Bank API, we calculate that there are 19 489 indicators available from the World Bank. To give a sense of the scale of the data available from the World Bank, the full dataset has around 236 million rows in long format.

Read more

This work was used in Rand Merchant Bank's "Where to Invest in Africa 2024" report.

https://covid19economicideas.org/2020/04/20/the-impact-of-lockdown-on-employment-at-the-va-waterfront/

My work involved assistance with:


https://www.opensaldru.uct.ac.za/handle/11090/1005

I produced the statistical tables and graphs in this publication.


Since 2022-10-08.  I run a Python script daily.

See Previous Research and a summary of my Résumé.


Testimonials

Prof Andrew Donaldson:  andrew.donaldson@uct.ac.za

Dr Daan Steenkamp:  daan@codera.co.za

Mr Grant Smith:  grant@gmtplus.co.za 


Contact

aidan@econometrics.co.za

https://calendly.com/aidan-horn/

Example ggplot code

Here is an example of how to write a simple graph in R, using the ggplot2 package. The data would usually be pulled in from a data file, but the code below shows how ggplot can be used to create a graph. This also shows how I lay out my code. The output is shown on the right.

library('ggplot2')library('scales')library('tidyverse')

base_graph = list(  # can be used as default settings for all graphs in the project   theme(      plot.title.position = "plot",      axis.title.x = element_text(margin=margin(t=5)),      axis.title.y = element_text(margin=margin(r=10)),      axis.text.x = element_text(size=rel(1.1), margin=margin(b=2)),      panel.grid.major = element_blank(),      panel.grid.minor = element_blank()   ),   scale_x_continuous(      breaks = pretty_breaks(),      expand = expansion(mult=c(0.015, 0.14)),      labels = scales::percent_format(accuracy = 1)   ))
employment_graph <- ggplot(   data = data.frame(         age = factor(seq(1, 5),            seq(1, 5),            c(               "18-29", "30-39", "40-49", "50-59", "60-65"            )         ),         prop = c(0.63, 0.78, 0.90, 0.82, 0.68)      ) %>% as_tibble() %>%      mutate(         percentage = paste0(round(prop*100), '%')      )   ,   aes(      x = prop,      y = age   )) +   theme_minimal() + base_graph +   geom_bar(      stat = "identity",      fill = "darkblue"   ) +   coord_flip() +   geom_text(      aes(         x = prop*0.89 - 0.04,         label = percentage,         y = age      ),      stat = "identity",      size = rel(3.5),      check_overlap=T,      color = "white"   ) +   labs(      x = "Proportion",      y = "Age",      title = "Employment rate by age"   )
employment_graph

png(   filename="employment.png",   width = 600, height = 500,   res = 150)employment_graphdev.off()