File:NZ opinion polls 2002-2005-parties.png
From Wikimedia Commons, the free media repository
Jump to navigation
Jump to search
NZ_opinion_polls_2002-2005-parties.png (778 × 487 pixels, file size: 6 KB, MIME type: image/png)
File information
Structured data
Captions
Summary
[edit]DescriptionNZ opinion polls 2002-2005-parties.png |
English: Graph showing support for political parties in New Zealand between 2002 and 2005 elections, according to various political polls. Data is obtained from the Wikipedia page, Opinion_polling_for_the_New_Zealand_general_election,_2005 |
Date | |
Source | Own work |
Author | Trevva |
Figure is produced using the R statistical package, using the following code. It first reads the HTML directly from the website, then parses the data and saves the graph into your working directory. It should be able to be run directly by anyone with R.
rm(list=ls())
#Parameters
party.names <- c("Labour","National","NZ.First","ACT","Greens","United Future","Maori","Destiny")
parties.to.plot <- c("Greens","Labour","NZ.First","National")
party.cols <- c("darkgreen","red","black","darkblue")
#Load the complete HTML file into memory
html <- readLines(url("http://en.wikipedia.org/wiki/Opinion_polling_for_the_New_Zealand_general_election,_2005",encoding="UTF-8"))
closeAllConnections()
#Extract the opinion poll data table
tbl.no <- 1
tbl <- html[(grep("<table.*",html)[tbl.no]):(grep("</table.*",html)[tbl.no])]
#Now split it into the rows, based on the <tr> tag
tbl.rows <- list()
open.tr <- grep("<tr",tbl)
close.tr <- grep("</tr",tbl)
for(i in 1:length(open.tr)) tbl.rows[[i]] <- tbl[open.tr[i]:close.tr[i]]
#Throwout items that are headers or extra info
tbl.rows <- tbl.rows[sapply(tbl.rows,function(x) length(grep("<td",x)))>1]
#Now extract the data
survey.dat <- lapply(tbl.rows,function(x) {
#Start by only considering where we have <td> tags
td.tags <- x[grep("<td",x)]
#Polling data appears in columns 3-10
dat <- td.tags[3:10]
#Now strip the data and covert to numeric format
dat <- gsub("<td>|</td>","",dat)
dat <- gsub("%","",dat)
dat <- gsub("-","0",dat)
dat <- gsub("<","",dat)
dat <- as.numeric(dat)
dat <- ifelse(is.na(dat),0,dat)
names(dat) <- party.names
#Getting the date strings is a little harder. Start by tidying up the dates
date.str <- td.tags[2] #Dates are in the second column
date.str <- gsub("<sup.*</sup>","",date.str) #Throw out anything between superscript tags, as its an reference to the source
date.str <- gsub("<td>|</td>","",date.str) #Throw out any tags
#Get numeric parts of string
digits.str <- gsub("[^0123456789]"," ",date.str)
digits.str <- gsub("^ +","",digits.str) #Drop leading whitespace
digits <- strsplit(digits.str," +")[[1]]
yrs <- grep("[0-9]{4}",digits,value=TRUE)
days <- digits[!digits%in%yrs]
if(length(days)==0) {days <- 15}
#Get months
month.str <- gsub("[^A-Z,a-z]"," ",date.str)
month.str <- gsub("^ +","",month.str) #Drop leading whitespace
mnths <- strsplit(month.str," +",month.str)[[1]]
#Now paste together to make standardised date strings
days <- rep(days,length.out=2)
mnths <- rep(mnths,length.out=2)
yrs <- rep(yrs,length.out=2)
dates.std <- paste(days,mnths,yrs)
# cat(sprintf("%s\t -> \t %s, %s\n",date.str,dates.std[1],dates.std[2]))
#And finally the survey time
survey.time <- mean(as.POSIXct(strptime(dates.std,format="%d %B %Y")))
#Get the name of the survey company too
survey.comp <- td.tags[1]
survey.comp <- gsub("<sup.*</sup>","",survey.comp)
survey.comp <- gsub("<td>|</td>","",survey.comp)
survey.comp <- gsub("<U+2013>","-",survey.comp,fixed=TRUE)
survey.comp <- gsub("(?U)<.*>","",survey.comp,perl=TRUE)
#And now return results
return(data.frame(Company=survey.comp,Date=survey.time,date.str,t(dat)))
})
#Combine results
surveys <- do.call(rbind,survey.dat)
#Restrict plot(manually) to parties which have been over 5%
polls <- surveys[,c("Company","Date",parties.to.plot)]
polls <- subset(polls,!is.na(surveys$Date))
polls <- polls[order(polls$Date),]
polls$date.num <- as.double(polls$Date)
#Setup plot
ticks <- ISOdate(c(2002,rep(2003,2),rep(2004,2),rep(2005,2),2006),c(7,rep(c(1,7),3),1),1)
xlims <- range(ticks)
png("NZ_opinion_polls_2002-2005-parties.png",width=778,height=487,pointsize=16)
par(mar=c(5,4,1,1))
matplot(polls$date.num,polls[,parties.to.plot],pch=NA,xlim=xlims,ylab="Party support (%)",
xlab="",col=party.cols,xaxt="n",ylim=c(0,65),yaxs="i")
abline(h=seq(0,95,by=5),col="lightgrey",lty=3)
abline(v=as.double(ticks),col="lightgrey",lty=3)
box()
axis(1,at=as.double(ticks),labels=format(ticks,format="1 %b\n%Y"),cex.axis=0.8)
axis(4,at=axTicks(4),labels=rep("",length(axTicks(4))))
#Now calculate the loess smoothers
#Exclude 2002 election result from smoother
#Draw direct line from 2002 to first survey
#Exclude confidence interval
smoothed <- list()
smooth.dat <- polls[-1,]
predict.x <- seq(min(smooth.dat$date.num),max(smooth.dat$date.num),length.out=100)
for(i in 1:length(parties.to.plot)) {
smoother <- loess(smooth.dat[,parties.to.plot[i]] ~ smooth.dat[,"date.num"],span=0.75)
smoothed[[i]] <- predict(smoother,newdata=predict.x,se=TRUE)
#polygon(c(predict.x,rev(predict.x)),
# c(smoothed[[i]]$fit+smoothed[[i]]$se.fit*1.96,rev(smoothed[[i]]$fit-smoothed[[i]]$se.fit*1.96)),
# col=rgb(0.5,0.5,0.5,0.5),border=NA)
}
names(smoothed) <- parties.to.plot
#Then add the data points
matpoints(polls$date.num,polls[,parties.to.plot],pch=20,col=party.cols)
#And finally the smoothers themselves
for(i in 1:length(parties.to.plot)) {
segments(polls$date.num[1],polls[1,parties.to.plot[i]],min(predict.x),smoothed[[i]]$fit[1],col=party.cols[i],lwd=2)
lines(predict.x,smoothed[[i]]$fit,col=party.cols[i],lwd=2)
}
legend("bottom",legend=gsub("\\."," ",parties.to.plot),col=party.cols,pch=20,bg="white",lwd=2,horiz=TRUE,inset=-0.225,xpd=NA)
#Add best estimates
#for(i in 1:length(smoothed)) {
# lbl <- sprintf("%2.0f±%1.0f %%",round(rev(smoothed[[i]]$fit)[1],0),round(1.96*rev(smoothed[[i]]$se.fit)[1],0))
# text(rev(polls$date.num)[1],rev(smoothed[[i]]$fit)[1],labels=lbl,pos=4,col=party.cols[i])
#}
dev.off()
cat("Complete.\n")
Licensing
[edit]I, the copyright holder of this work, hereby publish it under the following license:
This file is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.
- You are free:
- to share – to copy, distribute and transmit the work
- to remix – to adapt the work
- Under the following conditions:
- attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.
File history
Click on a date/time to view the file as it appeared at that time.
Date/Time | Thumbnail | Dimensions | User | Comment | |
---|---|---|---|---|---|
current | 10:47, 3 August 2011 | 778 × 487 (6 KB) | Trevva (talk | contribs) |
You cannot overwrite this file.
File usage on Commons
There are no pages that use this file.