<<Land doesn’t vote, people do.>> - An Application on 2018 Brazil’s Presidential Election

This post aims to create nice maps from the 2018 Brazil’s Presidential Election.

Bruno Vidigal true
02-18-2021

Motivation

You may think this post was written too late as we’re already in 2021 and we are working on the 2018 Brazil’s presidential election data. Nevertheless, it’s never late to do nice data visualizations, and I got inspiration to write this post after seeing an outstanding map made by David Zumbach. He made the code available on this Github repository and I will apply roughly the same idea (mine is a bit simpler) in the Brazilian election.

Presidential Elections

The presidential election takes place every four years in Brazil. The last one occurred in 2018 and ended up with a winning of the candidate Bolsonaro.

However, I am not here to talk about how Bolsonaro is (mis)leading Brazil. I am interested in visualizing the share of votes he got in the second round of the election. For more information regarding the Brazilian elections, please go visit the Tribunal Superior Eleitoral (TSE) webpage.

Data

In order to produce a map with the proportion of votes Bolsonaro got, we will need to pull:

Shape files

The shape files can be downloaded from here. Then you select Downloads \(\Rightarrow\) municipio_2019 \(\Rightarrow\) Brasil \(\Rightarrow\) BR \(\Rightarrow\) br_municipios zip file.

Election results

The outcome of the elections are available here. The data is divided by state, so you will need to download 28 zip files (26 states + Distrito Federal + Votes from Abroad). Instead, if you want to go for a smart solution, please have a look at this post that I have written. I will walk you on how to download these data automatically from R.

Mapping table IBGE - TSE

This table is quite useful as it contains the mapping of municipalities codes from IBGE to TSE. It is stored on this Github repository.

Reading Input Data

Let’s start by reading the mapping table from the Github repository.

download.file(url = 'https://raw.githubusercontent.com/betafcc/Municipios-Brasileiros-TSE/master/municipios_brasileiros_tse.csv', 
              destfile = 'data/mapping_tse_ibge.csv')

mapping_tse_ibge <- read_csv('data/mapping_tse_ibge.csv')
mapping_tse_ibge <- mapping_tse_ibge %>% 
  select(codigo_tse, codigo_ibge) %>% 
  mutate(codigo_tse = as.character(str_pad(codigo_tse, 5, pad = '0')),
         codigo_ibge = as.character(codigo_ibge))

mapping_tse_ibge
# A tibble: 5,570 x 2
   codigo_tse codigo_ibge
   <chr>      <chr>      
 1 01120      1200013    
 2 01570      1200054    
 3 01058      1200104    
 4 01007      1200138    
 5 01015      1200179    
 6 01074      1200203    
 7 01112      1200252    
 8 01139      1200302    
 9 01104      1200328    
10 01554      1200344    
# … with 5,560 more rows

The shape file from IBGE is below.

br_shp <- st_read("data/br_municipios_20200807/BR_Municipios_2019.shp")
Reading layer `BR_Municipios_2019' from data source `/home/vidigal/Documents/projects/github/brunovidigal_distill/_posts/2021-02-18-land-doesnt-vote-people-do-an-application-on-2018-brazils-presidential-election/data/br_municipios_20200807/BR_Municipios_2019.shp' using driver `ESRI Shapefile'
Simple feature collection with 5572 features and 4 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -73.99045 ymin: -33.75118 xmax: -28.84764 ymax: 5.271841
proj4string:    +proj=longlat +ellps=GRS80 +no_defs 
class(br_shp)
[1] "sf"         "data.frame"

If we plot the shape file above we get:

br_shp %>% 
  ggplot() + 
  geom_sf() +
  theme_void() +
  ggtitle('Brazil - Municipalities')

The last piece of info needed are the votes. As we saw in my previous post, the results of the elections are split by state (SG_UF), and within each file/state, the data are disaggregated by municipality (NM_MUNICIPIO) and electoral section (NR_SECAO). Let’s first get the name of all csv’s files we need to read. Instead of typing one by one …

We can use the function list.files().

csvs_to_read = list.files(
  path = "data/elections_2018",  
  pattern = ".*(bweb_2t).*csv$", 
  recursive = TRUE,          
  full.names = TRUE         
)

head(csvs_to_read)
[1] "data/elections_2018/BWEB_2t_AC_301020181744/bweb_2t_AC_301020181744.csv"
[2] "data/elections_2018/BWEB_2t_AL_301020181744/bweb_2t_AL_301020181745.csv"
[3] "data/elections_2018/BWEB_2t_AM_301020181744/bweb_2t_AM_301020181745.csv"
[4] "data/elections_2018/BWEB_2t_AP_301020181744/bweb_2t_AP_301020181746.csv"
[5] "data/elections_2018/BWEB_2t_BA_301020181744/bweb_2t_BA_301020181746.csv"
[6] "data/elections_2018/BWEB_2t_CE_301020181744/bweb_2t_CE_301020181747.csv"

Now we’re ready to go ahead and read all those files. Another amazing function that we’re going to use is map_df() from purrr. Here I am using also fread() from package data.table. Combining both tidyverse and data.table solutions minimize as much as possible the time to read and combine the files into one single object of class tbl_df.

elections_2nd_round <- 
  csvs_to_read %>% 
  map_df(~fread(., colClasses = 'character', showProgress = TRUE)) %>% 
  as_tibble()

The final object has 2904174 rows and 42 columns.

Let’s see how our data looks like. As I am biased I will filter my hometown Rio Pomba. My city has about 18,000 habitants and is located in the countryside of the state of Minas Gerais. We’re famous about our cheese and doce de leite (milk jam or dulce de leche in Spanish). But here we go, we select Rio Pomba and a random electoral section. Mind that the data found contain not only the number of votes for each candidate, but also the number of null and blank. In this analysis, we’re interested in analysing only the valid votes.

str(elections_2nd_round %>% 
  filter(NM_MUNICIPIO == 'RIO POMBA' & NR_SECAO == 20))
tibble [8 × 42] (S3: tbl_df/tbl/data.frame)
 $ DT_GERACAO                : chr [1:8] "30/10/2018" "30/10/2018" "30/10/2018" "30/10/2018" ...
 $ HH_GERACAO                : chr [1:8] "17:49:29" "17:49:29" "17:49:29" "17:49:29" ...
 $ ANO_ELEICAO               : chr [1:8] "2018" "2018" "2018" "2018" ...
 $ CD_PLEITO                 : chr [1:8] "229" "229" "229" "229" ...
 $ DT_PLEITO                 : chr [1:8] "28/10/2018" "28/10/2018" "28/10/2018" "28/10/2018" ...
 $ NR_TURNO                  : chr [1:8] "2" "2" "2" "2" ...
 $ CD_ELEICAO                : chr [1:8] "296" "296" "296" "296" ...
 $ DS_ELEICAO                : chr [1:8] "Elei\xe7\xe3o Geral Federal 2018" "Elei\xe7\xe3o Geral Federal 2018" "Elei\xe7\xe3o Geral Federal 2018" "Elei\xe7\xe3o Geral Federal 2018" ...
 $ DT_ELEICAO                : chr [1:8] "28/10/2018" "28/10/2018" "28/10/2018" "28/10/2018" ...
 $ SG_ UF                    : chr [1:8] "MG" "MG" "MG" "MG" ...
 $ CD_MUNICIPIO              : chr [1:8] "51152" "51152" "51152" "51152" ...
 $ NM_MUNICIPIO              : chr [1:8] "RIO POMBA" "RIO POMBA" "RIO POMBA" "RIO POMBA" ...
 $ NR_ZONA                   : chr [1:8] "239" "239" "239" "239" ...
 $ NR_SECAO                  : chr [1:8] "20" "20" "20" "20" ...
 $ NR_LOCAL_VOTACAO          : chr [1:8] "1015" "1015" "1015" "1015" ...
 $ CD_CARGO_PERGUNTA         : chr [1:8] "1" "1" "1" "1" ...
 $ DS_CARGO_PERGUNTA         : chr [1:8] "Presidente" "Presidente" "Presidente" "Presidente" ...
 $ NR_PARTIDO                : chr [1:8] "#NULO#" "#NULO#" "13" "17" ...
 $ SG_PARTIDO                : chr [1:8] "#NULO#" "#NULO#" "PT" "PSL" ...
 $ NM_PARTIDO                : chr [1:8] "#NULO#" "#NULO#" "Partido dos Trabalhadores" "Partido Social Liberal" ...
 $ QT_APTOS                  : chr [1:8] "304" "304" "304" "304" ...
 $ QT_COMPARECIMENTO         : chr [1:8] "226" "226" "226" "226" ...
 $ QT_ABSTENCOES             : chr [1:8] "78" "78" "78" "78" ...
 $ CD_TIPO_URNA              : chr [1:8] "1" "1" "1" "1" ...
 $ DS_TIPO_URNA              : chr [1:8] "Apurada" "Apurada" "Apurada" "Apurada" ...
 $ CD_TIPO_VOTAVEL           : chr [1:8] "2" "3" "1" "1" ...
 $ DS_TIPO_VOTAVEL           : chr [1:8] "Branco" "Nulo" "Nominal" "Nominal" ...
 $ NR_VOTAVEL                : chr [1:8] "95" "96" "13" "17" ...
 $ NM_VOTAVEL                : chr [1:8] "Branco" "Nulo" "FERNANDO HADDAD" "JAIR BOLSONARO" ...
 $ QT_VOTOS                  : chr [1:8] "6" "20" "91" "109" ...
 $ NR_URNA_EFETIVADA         : chr [1:8] "1144507" "1144507" "1144507" "1144507" ...
 $ CD_CARGA_1_URNA_EFETIVADA : chr [1:8] "711.509.856.566.808.007." "711.509.856.566.808.007." "711.509.856.566.808.007." "711.509.856.566.808.007." ...
 $ CD_CARGA_2_URNA_EFETIVADA : chr [1:8] "218.832" "218.832" "218.832" "218.832" ...
 $ CD_FLASCARD_URNA_EFETIVADA: chr [1:8] "2685D0FE" "2685D0FE" "2685D0FE" "2685D0FE" ...
 $ DT_CARGA_URNA_EFETIVADA   : chr [1:8] "23/09/2018 08:19:00" "23/09/2018 08:19:00" "23/09/2018 08:19:00" "23/09/2018 08:19:00" ...
 $ DS_CARGO_PERGUNTA_SECAO   : chr [1:8] "1 - 20" "1 - 20" "1 - 20" "1 - 20" ...
 $ DS_AGREGADAS              : chr [1:8] "#NULO#" "#NULO#" "#NULO#" "#NULO#" ...
 $ DT_ABERTURA               : chr [1:8] "28/10/2018 08:00:00" "28/10/2018 08:00:00" "28/10/2018 08:00:00" "28/10/2018 08:00:00" ...
 $ DT_ENCERRAMENTO           : chr [1:8] "28/10/2018 17:00:58" "28/10/2018 17:00:58" "28/10/2018 17:00:58" "28/10/2018 17:00:58" ...
 $ QT_ELEITORES_BIOMETRIA_NH : chr [1:8] "3" "3" "3" "3" ...
 $ NR_JUNTA_APURADORA        : chr [1:8] "#NULO#" "#NULO#" "#NULO#" "#NULO#" ...
 $ NR_TURMA_APURADORA        : chr [1:8] "#NULO#" "#NULO#" "#NULO#" "#NULO#" ...
 - attr(*, ".internal.selfref")=<externalptr> 

As you may be thinking, we need to:

The piece of code below does this for us.

votes_by_city <- elections_2nd_round %>% 
  filter(!is.na(as.numeric(QT_VOTOS))) %>% 
  group_by(NM_MUNICIPIO, CD_MUNICIPIO, `SG_ UF`, NM_VOTAVEL, DS_CARGO_PERGUNTA) %>% 
  summarise(votes_total = sum(as.numeric(QT_VOTOS))) %>% 
  filter(DS_CARGO_PERGUNTA == 'Presidente' & `SG_ UF` != 'ZZ') %>% ## ZZ means votes abroad 
  spread(NM_VOTAVEL, votes_total) %>% 
  mutate(valid_votes = `FERNANDO HADDAD` + `JAIR BOLSONARO`,
         perc_bolsonaro = `JAIR BOLSONARO`/valid_votes) %>% 
  left_join(mapping_tse_ibge, by = c("CD_MUNICIPIO" = "codigo_tse")) %>% 
  ungroup() %>% 
  select(codigo_ibge, valid_votes, perc_bolsonaro)

votes_by_city
# A tibble: 5,570 x 3
   codigo_ibge valid_votes perc_bolsonaro
   <chr>             <dbl>          <dbl>
 1 2100055           53238          0.515
 2 5200050            6213          0.690
 3 3100104            3938          0.538
 4 5200100            6985          0.745
 5 3100203           11870          0.639
 6 1500107           81542          0.244
 7 2300101            5346          0.127
 8 2900207            9530          0.218
 9 2900108            4436          0.393
10 4100103            4513          0.608
# … with 5,560 more rows

Categorizing our variable …

# Recoding Bolsonaro's votes
votes_by_city <- votes_by_city %>% 
  mutate(
    perc_bolsonaro_cat = factor(case_when(
      perc_bolsonaro < 0.35 ~ "",
      perc_bolsonaro >= 0.35 & perc_bolsonaro < 0.40 ~ "35", 
      perc_bolsonaro >= 0.40 & perc_bolsonaro < 0.45 ~ "40",
      perc_bolsonaro >= 0.45 & perc_bolsonaro < 0.50 ~ "45",
      perc_bolsonaro >= 0.50 & perc_bolsonaro < 0.55 ~ "50",
      perc_bolsonaro >= 0.55 & perc_bolsonaro < 0.60 ~ "55",
      perc_bolsonaro >= 0.60 & perc_bolsonaro < 0.65 ~ "60",
      perc_bolsonaro >= 0.65 ~ "65"
    ), levels = c("", "35", "40", "45", "50", "55", "60", "65")
    )
  )

Data Merging

votes_by_city_shp <- br_shp %>% 
  left_join(votes_by_city, by = c("CD_MUN" = "codigo_ibge")) %>% 
  filter(!is.na(valid_votes))

Map

Here we go, the map of votes at municipality level.

ggplot(data = votes_by_city_shp, aes(geometry = geometry)) +
  geom_sf(aes(fill = perc_bolsonaro)) +
  scale_fill_binned(low = "#c91022", high = "#1a7bc5", labels = percent) +
  # scale_fill_viridis_c(option = "plasma", trans = "sqrt", labels = percent) +
  theme_void() +
  labs(title = "Percentage Votes for Bolsonaro",
       subtitle = "2nd round - Presidential Elections 2018",
       caption = "@vidigal_br") +
  theme(legend.title = element_blank()) +
  # theme(legend.key.size = unit(0.2, "cm")) +
  theme(plot.title = element_text(size = 30, face = "bold", family = 'calibri')) +
  theme(plot.subtitle = element_text(size = 20, face = "bold", family = 'calibri')) +
  theme(plot.caption = element_text(size = 15, face = "bold", family = 'calibri'))

The map of the categorical variable looks like this …

ggplot(votes_by_city_shp$geometry) +
  geom_sf(aes(fill = votes_by_city_shp$perc_bolsonaro_cat), color = NA) +
  coord_sf() +
  scale_fill_manual(
    values = c(
      "#8d0613", "#c91022", "#f1434a", "#ff9193",
      "#91cdff", "#42a2f1", "#1a7bc5", "#105182"
    ),
    drop = F,
    guide = guide_legend(
      direction = "horizontal",
      keyheight = unit(2, units = "mm"),
      keywidth = unit(c(25, rep(7, 6), 25), units = "mm"),
      title.position = "top",
      title.hjust = 0.5,
      label.hjust = 1,
      nrow = 1,
      byrow = T,
      reverse = T,
      label.position = "bottom",
    )
  ) +
  theme_void() +
  theme(legend.position = "bottom") +
  theme(plot.title = element_text(size = 30, face = "bold", family = 'calibri')) +
  theme(plot.subtitle = element_text(size = 20, face = "bold", family = 'calibri')) +
  theme(plot.caption = element_text(size = 15, face = "bold", family = 'calibri')) +
  theme(legend.text = element_text(size = 15, face = "bold", family = 'calibri')) +
    labs(title = "Percentage Votes for Bolsonaro",
       subtitle = "2nd round - Presidential Elections 2018",
       caption = "@vidigal_br") +
  theme(legend.title = element_blank())

However, as land doesn’t vote … people do, we will make another map taking into account the number of valid votes per each municipality. This piece of code comes from David Zumbach, I have just adapted to my own data/variables.

# Prep function inputs (specify radius factor)
radii <- votes_by_city_shp %>% 
  # filter(mun_id %in% start$id) %>% 
  select(CD_MUN, valid_votes) %>% 
  mutate(radius = sqrt(3000*valid_votes / pi)) %>% 
  arrange(CD_MUN) %>% 
  pull(radius) 

ids <- votes_by_city_shp$CD_MUN

# Transformation from Polygons to circles

# Function to draw circles 
draw_circle <- function(id, centre_x = 0, centre_y = 0, radius = 1000, detail = 360, st = TRUE) {
  
  i <- seq(0, 2 * pi, length.out = detail + 1)[-detail - 1]
  x <- centre_x + (radius * sin(i))
  y <- centre_y + (radius * cos(i))
  
  if (st) {
    
    cir <- st_polygon(list(cbind(x, y)[c(seq_len(detail), 1), , drop = FALSE]))
    d <- st_sf(data.frame(id = id, geom = st_sfc(cir)))
    
  } else {
    
    d <- tibble(id = id, x = x, y = y)
    
  }
  
  return(d)
  
}

# Draw circles
# centroids <- as_tibble(st_coordinates(st_centroid(mapa_mg)))
centroids <- st_transform(votes_by_city_shp$geometry, 29101) %>% 
  st_centroid() %>% 
  # this is the crs from d, which has no EPSG code:
  # st_transform(., '+proj=longlat +ellps=GRS80 +no_defs') %>%
  # since you want the centroids in a second geometry col:
  st_coordinates() %>% 
  as_tibble()

class(centroids)
[1] "tbl_df"     "tbl"        "data.frame"
perc_bolsonaro <- votes_by_city_shp$perc_bolsonaro

end <- pmap_dfr(list(ids, centroids$X, centroids$Y, radii), draw_circle)

## merge

end_votes <- end %>% 
  left_join(votes_by_city, by = c("id" = "codigo_ibge")) 

Now we’re ready to plot the map with circles weighted by the number of valid votes. Big cities like São Paulo and Rio de Janeiro will be better represented.

ggplot(end_votes) +
  geom_sf(aes(fill = perc_bolsonaro_cat), color = NA) +
  coord_sf() +
  scale_fill_manual(
    values = c(
      "#8d0613", "#c91022", "#f1434a", "#ff9193",
      "#91cdff", "#42a2f1", "#1a7bc5", "#105182"
    ),
    drop = F,
    guide = guide_legend(
      direction = "horizontal",
      keyheight = unit(2, units = "mm"),
      keywidth = unit(c(25, rep(7, 6), 25), units = "mm"),
      title.position = "top",
      title.hjust = 0.5,
      label.hjust = 1,
      nrow = 1,
      byrow = T,
      reverse = T,
      label.position = "bottom",
    )
  ) +
  theme_void() +
  theme(legend.position = "bottom") +
    labs(title = "Percentage Votes for Bolsonaro",
       subtitle = "2nd round - Presidential Elections 2018",
       caption = "@vidigal_br") +
  theme(legend.title = element_blank()) +
  theme(plot.title = element_text(size = 30, face = "bold", family = 'calibri')) +
  theme(plot.subtitle = element_text(size = 20, face = "bold", family = 'calibri')) +
  theme(plot.caption = element_text(size = 15, face = "bold", family = 'calibri')) +
  theme(legend.text = element_text(size = 15, face = "bold", family = 'calibri')) 

Citation

For attribution, please cite this work as

Vidigal (2021, Feb. 18). Bruno Vidigal: <<Land doesn't vote, people do.>> - An Application on 2018 Brazil's Presidential Election. Retrieved from https://www.brunovidigal.com/posts/2021-02-18-land-doesnt-vote-people-do-an-application-on-2018-brazils-presidential-election/

BibTeX citation

@misc{vidigal2021<<land,
  author = {Vidigal, Bruno},
  title = {Bruno Vidigal: <<Land doesn't vote, people do.>> - An Application on 2018 Brazil's Presidential Election},
  url = {https://www.brunovidigal.com/posts/2021-02-18-land-doesnt-vote-people-do-an-application-on-2018-brazils-presidential-election/},
  year = {2021}
}