CO2 Emissions Comparing and Modeling for Global Warming

Recently, I watched the new Predator movie, Pray, and I loved it. Hence, I’ve fallen into the Predators series, especially the first two movies, again. The Predators in the movie would choose warm planets to hunt because their vision capabilities are based on infrared thermal, which allows them to see heat signatures.

Carbon dioxide emissions are the primary driver of global warming on Earth. So, I decided to check what is responsible for that to avoid the Predators coming to the Earth:)

We will examine the countries in terms of the annual carbon emissions per capita vs. GDP per capita.

#Building a dataset

df_co <- read_csv("")
df_gdp <- read_csv("")

df <- 
  df_gdp %>% 
  left_join(df_co) %>% 
  group_by(Entity) %>% 
  #the last value of each group
  top_n(n=1, wt= Year) %>%
  clean_names() %>% 
  #continent names
  mutate(region = countrycode(sourcevar = entity,
                              origin = "",
                              destination = "")) %>% 
  ungroup() %>%
    co2= annual_co2_emissions_per_capita,
    gdp= gdp_per_capita_ppp_constant_2017_international) %>% 

We will compare the top 20 countries ranked by carbon emissions.

#Comparing the top 20 countries ranked by carbon emissions
df %>% 
  slice_max(order_by= co2, n=21) %>% 
  ggplot(aes(x= gdp, y= co2, color= region))+
  geom_text(aes(label= entity),
            hjust= 0, 
            vjust= -0.5,
            check_overlap = TRUE,#removes one of the overlapped texts
            #legend key type
            key_glyph= "rect")+
  #Using scale_*_log10 to zoom in data on the plot
                labels = scales::label_dollar(accuracy = 2))+
  scale_y_continuous(labels = scales::label_number(scale_cut = cut_si("tonnes")))+
  labs(title= bquote(''* ~CO[2]~'emission per capita(2020) vs. GDP per capita' *''))+
  coord_fixed(ratio = 0.02, clip = "off")+#fits the text labels to the panel
    legend.position = "bottom",
    legend.text = element_text(size=12),
    plot.title = element_text(hjust=0.5)#centers the plot title

According to the above graphic, Asian countries seem to dominate the list despite Bahrain being removed from the chart for overlapping Kuwait. It is rather interesting that China is not on the list. Probably it is caused by its massive amount of population.

Now, we will try to find some variables that explain the change in carbon emissions. In order to do that we will apply a permutation-based variable importance method.

#Preprocessing the data
df_rec <- 
  recipe(formula = co2 ~ region + gdp, data = df) %>%
  step_dummy(all_nominal()) %>% 
  step_log(gdp, base = 10)

#Creating a tibble of the preprocessed data for modeling
imp_df <- 
  df_rec %>%
  prep() %>%
  bake(new_data = NULL) 

#Building a random forest model
  randomForest(co2 ~ .,
               ntree = 500, 
               data = imp_df)

#Permutation-based variable importance plot
    method = "permute", 
    target = "co2", 
    metric = "rsquared", 
    nsim = 100,
    pred_wrapper = predict, 
    geom = "boxplot",
    mapping = aes_string(fill = "Variable"), 
    aesthetics = list(color = "grey35"))+
  theme(legend.position = "none")

When we look at the random forest model, we see that the model explains %36 of change of carbon emissions. In this context, GDP is the most dominant component according to the above graph. Being in Asia seems to be the second most important variable and which explains the first chart we made.

One thought on “CO2 Emissions Comparing and Modeling for Global Warming

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: