The inspiration for this post comes from R. Duncan McIntosh’s post on Florida’s 2016 Primaries, by way of his post as seen on R-Bloggers.
EDIT: GitHub repo: https://github.com/SeaSmith1018/TexasPrimary2016
Let’s take a look at some of the 2016 Texas Primary results. I will be using McIntosh’s example except I will be working with Texas data (of course), and I will be using one extra library (rvest to gather the data) and one fewer library (reshape2). First step, I will be loading the libraries and then reading in the Republican primary results.
library(ggplot2) library(dplyr) library(rvest) library(choroplethr) library(choroplethrMaps) library(gridExtra) library(knitr) #Texas election data: http://elections.sos.state.tx.us/index.htm #download tex.rep <- "http://elections.sos.state.tx.us/elchist273_race62.htm" %>% read_html() %>% html_nodes("table") %>% html_table() tex.rep <- tex.rep[]
Next, I am going to set up the column names that I want. Then I am going to eliminate the first three rows, which consists of the fractured candidate names and a totals row. I will also be setting the appropriate data type for the columns (CountyName = character, EverythingElse = numeric) plus reducing CountyName to all lower-cases (for joining with the geographic data).
#set names tr.first <- names(tex.rep) tr.last <- tex.rep[1,] names(tex.rep) <- c("CountyName", tr.last[2:14], "Uncommitted", "TotalVotes", "TotalVoters", "TurnOut") #tidy up tex.rep <- tex.rep[-(1:3),] tex.rep$CountyName <- tolower(as.character(tex.rep$CountyName)) tex.rep[,2:17] <- sapply(tex.rep[,2:17], function(x) as.numeric(gsub(",", "", x))) tex.rep[,18] <- sapply(tex.rep[,18], function(x) as.numeric(gsub("%", "", x))) tex.rep$CountyName <- gsub("lasalle", "la salle", tex.rep$CountyName)
We’re only concerned with winners here…as far as being defined as those candidates who were (successfully) actively-campaigning at the time of the 2016 Texas primary – John Kasich, Marco Rubio, Donald Trump, and Ted Cruz. Time to calculate their percent winnings.
#add percent columns for top 4 tex.rep <- mutate(tex.rep, jk = (Kasich/TotalVotes)*100, mr = (Rubio/TotalVotes)*100, dt = (Trump/TotalVotes)*100, tc = (Cruz/TotalVotes)*100 ) tex.rep[,19:22] <- round(tex.rep[,19:22], digits = 1)
Tabling the Data
In McIntosh’s example, the data is tabled in knitr tables (which provide a clean look using “|”, “-“, and “:” for spacing. Also, I used library(htmlTable) to create the html table that you see just below the code.
dt.counties <- filter(tex.rep, Trump > Cruz & Trump > Rubio & Trump > Kasich) %>% select(1,19:22) kable(dt.counties, caption = "Counties won by Trump")
The output data…
Now it’s time to gather the geographic data to which I will be mapping the above data.
#get state geo data data("county.regions") tx.regions <- filter(county.regions, state.name == "texas") %>% select(region, "CountyName" = county.name) tx.r.results <- left_join(tex.rep, tx.regions)
Now that the data show the winner among the four candidates, I can now assign each candidate’s winnings (percent winnings) to the variable “value” (a required column for the choroplethr mapping functions). Along with “region”, which was assembled via the join function, I can now pass arguments to choroplethr’s county_choropleth() function.
###---Maps---### #Ted Cruz tx.r.tc <- tx.r.results tx.r.tc$value <- tx.r.results$tc choro_tc <- county_choropleth(tx.r.tc, state_zoom="texas", legend = "%", num_colors=1) + ggtitle("Ted Cruz") + coord_map() # Adds a Mercator projection choro_tc