Data Analysis

Skills

    • Specializing in the South African labour market

Technologies

Languages

Programs

Portfolio

How does data availability compare across countries? The World Bank is a major source of country-level economic, financial and socioeconomic data. In this analysis, we summarise the number of data series available from the World Bank for different countries as a proxy for macroeconomic and social data availability.

Using the World Bank API, we calculate that there are 19 489 indicators available from the World Bank. To give a sense of the scale of the data available from the World Bank, the full dataset has around 236 million rows in long format.

Read more

https://covid19economicideas.org/2020/04/20/the-impact-of-lockdown-on-employment-at-the-va-waterfront/

My work involved assistance with:


https://www.opensaldru.uct.ac.za/handle/11090/1005

I produced the statistical tables and graphs in this publication.


Since 2022-10-08.  I run a Python script daily.

See Previous Research and a summary of my Résumé.


Testimonials

Prof Andrew Donaldson:  andrew.donaldson@uct.ac.za

Dr Daan Steenkamp:  daan@codera.co.za

Mr Grant Smith:  grant@gmtplus.co.za 


Contact

aidan@econometrics.co.za

https://calendly.com/aidan-horn/

Example ggplot code

Here is an example of how to write a simple graph in R, using the ggplot2 package. The data would usually be pulled in from a data file, but the code below shows how ggplot can be used to create a graph. This also shows how I lay out my code. The output is shown on the right.

library('ggplot2')library('scales')library('tidyverse')

base_graph = list(  # can be used as default settings for all graphs in the project   theme(      plot.title.position = "plot",      axis.title.x = element_text(margin=margin(t=5)),      axis.title.y = element_text(margin=margin(r=10)),      axis.text.x = element_text(size=rel(1.1), margin=margin(b=2)),      panel.grid.major = element_blank(),      panel.grid.minor = element_blank()   ),   scale_x_continuous(      breaks = pretty_breaks(),      expand = expansion(mult=c(0.015, 0.14)),      labels = scales::percent_format(accuracy = 1)   ))
employment_graph <- ggplot(   data = data.frame(         age = factor(seq(1, 5),            seq(1, 5),            c(               "18-29", "30-39", "40-49", "50-59", "60-65"            )         ),         prop = c(0.63, 0.78, 0.90, 0.82, 0.68)      ) %>% as_tibble() %>%      mutate(         percentage = paste0(round(prop*100), '%')      )   ,   aes(      x = prop,      y = age   )) +   theme_minimal() + base_graph +   geom_bar(      stat = "identity",      fill = "darkblue"   ) +   coord_flip() +   geom_text(      aes(         x = prop*0.89 - 0.04,         label = percentage,         y = age      ),      stat = "identity",      size = rel(3.5),      check_overlap=T,      color = "white"   ) +   labs(      x = "Proportion",      y = "Age",      title = "Employment rate by age"   )
employment_graph

png(   filename="employment.png",   width = 600, height = 500,   res = 150)employment_graphdev.off()