Socio-Economic Impact Analysis of Severe Weather Events (1997-2016)

This is an analysis of the socio-economic impact of severe weather events in the San Diego county. This analysis is carried out for the last 20 years starting 1997 and ending in 2016. The analysis aims to answer questions specified under Data Questions below. Results from the analysis can be found under Insights.

Data Source

All of the data used in this analysis is sourced from the NOAA (National Oceanic and Atmospheric Administration). This data can be downloaded by either of the methods below. Each years data was downloaded as a seperate CSV (gzipped) file.

  • Direct Link
  • Follow links from the NOAA main page as specified below

    NOAA -> Frequently Accessed Tools & Resources -> Climate Data & Reports -> Find Climate Data Online -> Storm Events (found under Additional Data Access on the same page) -> Bulk Data Download -> HTTP

The code file for the dataset explaining the fields can be found here.

Data Questions

Socio-Economic impact refers to the impact in terms of the population, such as deaths and injuries, and in terms of damage to material assets such as property and crops. With this analysis we are attempting to quantify this impact. Specifically, we are looking to answer the following questions:

  1. Which events had the most social impact?
  2. Which events had the most economic impact?
  3. How has the socio-economic impact of severe weather events changed over the years?

Data Analysis

Below are R scripts and commetaries detailing the analysis carried out on the dataset.

In [2]:
#suppress warnings related to R versions among others
options(warn=-1,scipen=999)
#do this to turn warnings back on
#options(warn=0)
Dependencies

The raw data files are expected to be in a directory named data under the current workspace.

In [3]:
# load the necessary packages
suppressPackageStartupMessages(require(R.utils,quietly = TRUE))
suppressPackageStartupMessages(require(car,quietly = TRUE))
suppressPackageStartupMessages(require(tools,quietly = TRUE))
suppressPackageStartupMessages(require(ggplot2,quietly = TRUE))
suppressPackageStartupMessages(require(matrixStats,quietly = TRUE))
suppressPackageStartupMessages(require(repr,quietly = TRUE))
suppressPackageStartupMessages(require(reshape2,quietly = TRUE))
suppressPackageStartupMessages(require(scales,quietly = TRUE))
suppressPackageStartupMessages(require(R2HTML,quietly = TRUE))
In [4]:
CWD <- getwd()

DATADIR <- paste(CWD,"data",sep="/")
# create a temp directory to store figures generated here
path.figures <- paste(basename(tempdir()),"figure","",sep = "/")
Pre-Processing

We read in data for each year, filter it out for the county of interest (in this case, San Diego) and collate the data from each of the files for the 20 year period from 1997 to 2016 into a single data frame.

In [5]:
BASE_FILE_PREFIX <- "StormEvents_details-ftp_v1.0"

TARGET_YR_START <- 1997
TARGET_YR_END <- 2016
target_years <- TARGET_YR_START:TARGET_YR_END

# these are the fields we care about and their datatypes; also specified in the documentation
target_cols <- c("STATE","STATE_FIPS","YEAR","EVENT_TYPE","CZ_TYPE","CZ_FIPS",
                 "CZ_NAME","INJURIES_DIRECT","INJURIES_INDIRECT","DEATHS_DIRECT",
                 "DEATHS_INDIRECT","DAMAGE_PROPERTY","DAMAGE_CROPS")
                 
target_cols_class <- c("character","numeric","numeric","character","character","numeric",
                       "character","numeric","numeric","numeric",
                       "numeric","character","character")

# FIPS code for San Diego
SD_COUNTY_FIPS <- 73
SD_COUNTY_NAME <- "SAN DIEGO"
In [6]:
# this function takes in the year (e.g.: 2016) and reads data from the appropriate datafile
# returning the CSV data, filtered out for the San Diego county, as a data frame
ExtractCSV <- function(yr)
{
    datafile_prefix <- paste(BASE_FILE_PREFIX,yr,sep="_d") 
    datafile <- grep(datafile_prefix,dir(DATADIR),value=TRUE)
    datafile <- paste(DATADIR,datafile,sep="/")
    
    # check if file is not already uncompressed
    if(file_ext(datafile)=="gz")
    {
        gunzip(datafile)
        csvfile <- strsplit(datafile,".gz")[[1]]
    }
    else
        csvfile = datafile
    
    # create a connection with the positions of the cols we care about
    col_pos <- match(target_cols,scan(csvfile,sep=',',what=character(),nlines=1)) 
    con <- pipe(paste("cut -d, -f",paste(col_pos,collapse=',')," ",csvfile,sep='')) 
    
    # read the data
    csvdata <- read.csv(con,stringsAsFactors = FALSE,allowEscapes = TRUE,strip.white = TRUE)
    
    # filter by county
    df <- csvdata[with(csvdata,grepl("san diego",CZ_NAME,perl=T,ignore.case = T)),]
    #df <- csvdata[(csvdata$CZ_NAME == SD_COUNTY_NAME),]
    return (df)                    
}
In [7]:
df <- as.data.frame(do.call(rbind,sapply(target_years,ExtractCSV,simplify=FALSE)))
In [8]:
head(df)
STATESTATE_FIPSYEAREVENT_TYPECZ_TYPECZ_FIPSCZ_NAMEINJURIES_DIRECTINJURIES_INDIRECTDEATHS_DIRECTDEATHS_INDIRECTDAMAGE_PROPERTYDAMAGE_CROPS
4433CALIFORNIA 6 1997 Heavy Rain C 73 SAN DIEGO 0 0 0 0 0
4514CALIFORNIA 6 1997 Heat Z 62 SAN DIEGO COUNTY DESERTS0 0 0 0
7791CALIFORNIA 6 1997 Wildfire C 73 SAN DIEGO 0 0 0 0 1.75M
8603CALIFORNIA 6 1997 Funnel Cloud C 73 SAN DIEGO 0 0 0 0
9648CALIFORNIA 6 1997 Heat Z 50 SAN DIEGO COUNTY VALLEYS0 0 1 0
9649CALIFORNIA 6 1997 Heavy Rain C 73 SAN DIEGO 0 0 0 0

We inspect the data for missing values and discard any samples that do not have useful information for this analysis.

In [9]:
cat(paste("# Samples:", nrow(df), "\n"))
cat(paste("# Events: ",length(unique(df$EVENT_TYPE)),"\n"))
unique(df$EVENT_TYPE)
# Samples: 1103 
# Events:  35 
  1. 'Heavy Rain'
  2. 'Heat'
  3. 'Wildfire'
  4. 'Funnel Cloud'
  5. 'High Wind'
  6. 'Waterspout'
  7. 'Winter Storm'
  8. 'Winter Weather'
  9. 'Lightning'
  10. 'Tornado'
  11. 'Heavy Snow'
  12. 'Thunderstorm Wind'
  13. 'Strong Wind'
  14. 'Storm Surge/Tide'
  15. 'High Surf'
  16. 'Cold/Wind Chill'
  17. 'Hail'
  18. 'Flood'
  19. 'Flash Flood'
  20. 'Rip Current'
  21. 'Dense Fog'
  22. 'Dust Storm'
  23. 'Extreme Cold/Wind Chill'
  24. 'Debris Flow'
  25. 'Frost/Freeze'
  26. 'Avalanche'
  27. 'Dust Devil'
  28. 'Drought'
  29. 'Coastal Flood'
  30. 'Excessive Heat'
  31. 'Landslide'
  32. 'Tsunami'
  33. 'Marine Thunderstorm Wind'
  34. 'Marine High Wind'
  35. 'Marine Tropical Storm'
In [10]:
# check for missing values
ifelse (sum(colSums(is.na(df)))==0,cat("No missing data\n"),print(colSums(is.na(df))))
            STATE        STATE_FIPS              YEAR        EVENT_TYPE 
                0                 0                 0                 0 
          CZ_TYPE           CZ_FIPS           CZ_NAME   INJURIES_DIRECT 
                0                 0                 0                 0 
INJURIES_INDIRECT     DEATHS_DIRECT   DEATHS_INDIRECT   DAMAGE_PROPERTY 
                2                 2                 2                 0 
     DAMAGE_CROPS 
                0 
0
In [11]:
# use only samples where not all properties are missing
good <- !(is.na(df$DEATHS_DIRECT) & is.na(df$DEATHS_INDIRECT) & is.na(df$INJURIES_DIRECT) & is.na(df$INJURIES_INDIRECT) &                is.na(df$DAMAGE_PROPERTY) & is.na(df$DAMAGE_CROPS))
df <- df[good,]
cat(paste("# Samples Remaining:",nrow(df),"\n"))
# Samples Remaining: 1103 
Transformations

We convert values in the numeric fields to their appropriate data type since they are all read in as character strings.

In [12]:
# convert variables to the appropriate types
for(i in 1:ncol(df))
{
    if(target_cols_class[i] == "numeric")
        df[,i] <- as.numeric(df[,i])
}

Inspecting the DAMAGE_PROPERTY and DAMAGE_CROPS fields and from the code file it is clear that these need to be transformed in order to arrive at the actual damage in terms of US Dollars (USD). We essentially multiply the numerical part of each of these fields by the exponent determined by the character (K is a thousand, M is a million and so on) following the numerical part.

In [13]:
cat(paste("Property Damage (Raw):",sep=" "))
cat(unique(df$DAMAGE_PROPERTY)[1:10])
cat(paste("\n","   Crop Damage (Raw): ",sep=" "))
cat(unique(df$DAMAGE_CROPS)[1:10])
Property Damage (Raw): 1.75M 0 5K 2.5K 10.3M 150K 10K 1K 20K
    Crop Damage (Raw): 0  300K 6.9M 20K 800K 25K 750K 2M 7.8M
In [14]:
# convert property and crop damage values to numeric by multiplying the exponents (K, M, B) with the 
# appropriate multiple of 10; 
ProcessExponent <- function(x)
{
    val <- 0
    exp <- sapply(strsplit(x,""),tail,1)
    num <- as.numeric(gsub(".$","",x))
    if (!is.na(num))
    {
        fac <- ifelse((exp=="K"),10**3,ifelse((exp=="M"),
                                              10**6,ifelse((exp=="B"),10**9,10**0)))
        val <- (num * fac)
    }
    
    return (val)
}
In [15]:
df$DAMAGE_PROPERTY_USD <- sapply(df$DAMAGE_PROPERTY,ProcessExponent)
df$DAMAGE_CROPS_USD <- sapply(df$DAMAGE_CROPS,ProcessExponent)

This is the result of the above transformation.

In [16]:
cat(paste("Property Damage (USD): ",sep=" "))
cat(unique(df$DAMAGE_PROPERTY_USD)[1:10])
cat(paste("\n","   Crop Damage (USD): ",sep=" "))
cat(unique(df$DAMAGE_CROPS_USD)[1:10])
Property Damage (USD): 0 1750000 5000 2500 10300000 150000 10000 1000 20000 200000
    Crop Damage (USD): 0 300000 6900000 20000 800000 25000 750000 2000000 7800000 22000000
Analysis: Q1

Since we are attempting to understand the impact by event we first group the data by event type. Social impact pertains to the deaths and injuries in the data. While it is good that the data seperates out the deaths and injuries by whether they were caused directly or indirectly by the event, we are more interested in the overall impact in this analysis.

The output is a list of events that had the greatest impact by deaths, followed by injuries.

In [17]:
# group by event type
df_evt <- split(df,as.factor(df$EVENT_TYPE))
In [18]:
# function to collect and compute information related to population damage
# sum up the deaths and injuries, collapsing direct and indirect counts, the specified input (x)
# also count the number of instances within x
SocialImpact <- function(x)
{
    c((sum(x$DEATHS_DIRECT,na.rm=TRUE) + sum(x$DEATHS_INDIRECT,na.rm=TRUE)),
      (sum(x$INJURIES_DIRECT,na.rm = TRUE) + sum(x$INJURIES_INDIRECT,na.rm = TRUE)),
      nrow(x))
}

df_social <- as.data.frame(do.call(rbind, 
                                sapply(df_evt,SocialImpact,simplify=FALSE)))
names(df_social) <- c("DEATHS","INJURIES","INCIDENTS")

# compute % of total impact
df_social$DEATHS_PCT <- round((df_social$DEATHS/sum(df_social$DEATHS))*100,2)
df_social$INJURIES_PCT <- round((df_social$INJURIES/sum(df_social$INJURIES))*100,2)

# order the weather events by number of deaths first, then by number of 
# injuries in the descending order
order <- order(df_social$DEATHS,df_social$INJURIES,decreasing = TRUE)
df_social <- df_social[order,]
Social Impact: Top Ten Events

In the table below, the number of deaths and injuries resulting from each event across the entire span is given by DEATHS and INJURIES respectively. Note that these measures are aggregates of both direct and indirect occurences for each impact type. INCIDENTS represents the number of incidents of the event across the entire span (20 years). The percentage of total impact (in deaths and injuries) accounted for by each event is also represented.

In [19]:
head(df_social,n=10)
DEATHSINJURIESINCIDENTSDEATHS_PCTINJURIES_PCT
Wildfire30 487 151 28.3049.74
Rip Current14 71 22 13.21 7.25
Winter Weather14 68 12 13.21 6.95
Heat10 69 29 9.43 7.05
Heavy Snow 8 45 15 7.55 4.60
Flood 5 8 59 4.72 0.82
Winter Storm 4 67 14 3.77 6.84
High Surf 4 56 43 3.77 5.72
Cold/Wind Chill 4 15 6 3.77 1.53
Dense Fog 4 8 61 3.77 0.82
In [20]:
plot_options <- list(theme(panel.background = element_rect(fill = "transparent", colour = NA),
                          legend.background = element_rect(fill = "transparent", colour = NA),
                          legend.key = element_rect(fill = "white"),
                          axis.line.x = element_line(color="gray", size = 0.5),
                          axis.line.y = element_line(color="gray", size = 0.5),
                          plot.title=element_text(family="Times", face="bold", size=8,color="lightyellow4"),
                          #legend.title = element_text(family="Times", face="bold", size=8),
                          legend.title = element_blank(), 
                          legend.text = element_text(size=7, face="bold"),
                          legend.position = c(0.9,0.9),
                          axis.title.x = element_text(size=8,face="bold",color='brown3'),
                          axis.title.y = element_text(size=8,face="bold",color='brown3'), 
                          axis.ticks = element_line(color='gray'),
                          axis.text.x = element_text(size=8,angle=45, hjust=1,face="bold"),
                          axis.text.y = element_text(size=7,face="bold")))
In [21]:
options(repr.plot.width = 5, repr.plot.height = 4)

# we're only plotting the top 10 events
df_social_topten <- df_social[1:10,1:2] 
# reshape the data frame to suit plotting needs
df_social_topten$event <- rownames(df_social_topten)
df_melt <- melt(df_social_topten,id.vars='event')

# this is needed since ggplot (by default) does not stick to the order 
# in the original data frame
ordered_plot <- function(lvl) 
{
    df_melt$event <- factor(df_melt$event, levels = df_melt$event[order(-df_melt$value[df_melt$variable == lvl])])
    
    ggplot(data=df_melt) + 
           geom_bar(aes(x=event, y=value, fill = variable), width = 0.8, 
                         position = 'dodge', stat="identity") +
           plot_options 
}

plot <- ordered_plot(levels(df_melt$variable)[1]) +
        xlab("Weather Event") + ylab("") +
        ggtitle("Severe Weather Events: Social Impact by Event (1997-2016), San Diego County") +
        scale_fill_manual(name="",values=c('DEATHS'='lightblue','INJURIES'='lightsalmon'), 
                             labels = c("Deaths","Injuries"))
        
plot
Analysis: Q2

We carry out similar analysis to output events by their greatest economic impact. Here the list is ordered by damage to property, followed by crops since the San Diego county is a predominantly urban county with property being the dominant asset owned by the population as opposed to crops.

In [22]:
# function to collect and compute information related to crop damage
# sum up the damage collapsing direct and indirect instances for each group
EconomicImpact <- function(x)
{
    c(sum(x$DAMAGE_PROPERTY_USD,na.rm=TRUE), sum(x$DAMAGE_CROPS_USD,na.rm=TRUE),nrow(x))      
}

df_econ <- as.data.frame(do.call(rbind, 
                                sapply(df_evt,EconomicImpact,simplify=FALSE)))
names(df_econ) <- c("DAMAGE_PROPERTY","DAMAGE_CROPS","INCIDENTS")

# order the weather events by number of deaths first, then by number of 
# injuries in the descending order
order <- order(df_econ$DAMAGE_PROPERTY,df_econ$DAMAGE_CROPS,decreasing = TRUE)
df_econ <- df_econ[order,]

# compute % of total impact
df_econ$DAMAGE_PROPERTY_PCT <- round((df_econ$DAMAGE_PROPERTY/sum(df_econ$DAMAGE_PROPERTY))*100,2)
df_econ$DAMAGE_CROPS_PCT <- round((df_econ$DAMAGE_CROPS/sum(df_econ$DAMAGE_CROPS))*100,2)
Economic Impact: Top Ten Events

In the table below, damage to property and crops in USD is given by DAMAGE_PROPERTY andDAMAGE_CROPS respectively. INCIDENTS represents the number of incidents of the event across the entire span (20 years). The percentage of total damage (in property and crops) accounted for by each event is also represented.

In [23]:
head(df_econ,n=10)
DAMAGE_PROPERTYDAMAGE_CROPSINCIDENTSDAMAGE_PROPERTY_PCTDAMAGE_CROPS_PCT
Wildfire184092000073000000 151 91.41 30.94
Landslide 48000000 0 1 2.38 0.00
Heavy Rain 46485000 0 29 2.31 0.00
Flood 24091000 0 59 1.20 0.00
High Wind 2330700036620000 121 1.16 15.52
Flash Flood 18939500 7510000 144 0.94 3.18
Thunderstorm Wind 5769000 25000 42 0.29 0.01
High Surf 1484000 0 43 0.07 0.00
Debris Flow 1115000 0 12 0.06 0.00
Strong Wind 932000 0 27 0.05 0.00
In [24]:
options(repr.plot.width = 6, repr.plot.height = 5)

# we're only plotting the top 10 events
df_econ_topten <- df_econ[1:10,1:2]

# reshape the data frame to suit plotting needs
df_econ_topten$event <- rownames(df_econ_topten)
df_melt <- melt(df_econ_topten,id.vars='event')

# do this since we are using log scale (this way we 
# prevent log(0) from being plotted)
df_melt[df_melt == 0] <- NA

breaks = 10**c(0:10)
labs = c("1","10","100","1K","10K","100K","1M","10M","100M","1B","10B")
plot <- ordered_plot(levels(df_melt$variable)[1]) +
        scale_y_log10(breaks = breaks, labels = labs) +
        theme(axis.text.x = element_text(size=8,angle=0, hjust=0.5,face="bold")) +
        coord_flip() +
        xlab("Weather Event") + ylab("Amount (USD)") +
        ggtitle("Severe Weather Events: Economic Impact by Event (1997-2016), San Diego County") +
        scale_fill_manual(name="",values=c('DAMAGE_PROPERTY'='lightblue','DAMAGE_CROPS'='lightsalmon'), 
                             labels = c("Property Damage","Crop Damage"))
        
plot
Analysis: Q3

We start here by grouping the data by the year of occurence and subsequently summing them by group.

In [25]:
# group by year
df_yr <- split(df,as.factor(df$YEAR))

df_yr_social <- as.data.frame(do.call(rbind, 
                              sapply(df_yr,SocialImpact,simplify=FALSE)))
names(df_yr_social) <- c("DEATHS","INJURIES","INCIDENTS")
In [26]:
options(repr.plot.width = 6, repr.plot.height = 5)

x_ticks <- as.numeric(rownames(df_yr_social))
x_ticks_min <- min(x_ticks)
x_ticks_max <- max(x_ticks)
y_ticks_min <- min(colMins(data.matrix(df_yr_social)))
y_ticks_max <- max(colMaxs(data.matrix(df_yr_social)))

plot <- ggplot(data=df_yr_social) + 
          geom_col(aes(x=x_ticks,y=INCIDENTS),width=0.8,fill='black',alpha='0.1') +  
          geom_line(aes(x=x_ticks,y=DEATHS,group=1,color='DEATHS'),size=0.8) +
          geom_point(aes(x=x_ticks,y=DEATHS,group=1,color='DEATHS'),size=0.9) +
          geom_line(aes(x=x_ticks,y=INJURIES,group=1,color='INJURIES'),size=0.8) +
          geom_point(aes(x=x_ticks,y=INJURIES,group=1,color='INJURIES'),size=0.9) +          
          scale_x_continuous(breaks=seq(x_ticks_min,x_ticks_max,1)) +
          scale_y_continuous(breaks=seq(y_ticks_min,y_ticks_max,20)) +             
          xlab('Year') + ylab('') +
          ggtitle("Severe Weather Events: Social Impact by Year (1997-2016), San Diego County") +    
          scale_color_manual(name = "", values=c('DEATHS'='lightblue','INJURIES'='lightsalmon'), 
                             labels = c("Deaths","Injuries"))
Social Impact by Year (1997-2016)

In the plot below, the incidents are represented by bars for each year. Note that here the number of incidents is the aggregate across all event types.

In [27]:
t(df_yr_social)
plot + plot_options
19971998199920002001200220032004200520062007200820092010201120122013201420152016
DEATHS16 5 14 12 5 4 21 0 4 5 13 0 0 1 0 0 0 2 0 4
INJURIES 1 20 77 67 14512226429 1523 14116 10 0 19 3 12 10 0 5
INCIDENTS22 46 63 59 88 90 7882 10232 7862 66 23 15 37 33 55 46 26
In [28]:
df_yr_econ <- as.data.frame(do.call(rbind, 
                              sapply(df_yr,EconomicImpact,simplify=FALSE)))
names(df_yr_econ) <- c("DAMAGE_PROPERTY","DAMAGE_CROPS","INCIDENTS")
In [29]:
options(repr.plot.width = 6, repr.plot.height = 5)

df_yr_econ[df_yr_econ == 0] <- 0

x_ticks <- as.numeric(rownames(df_yr_econ))
x_ticks_min <- min(x_ticks)
x_ticks_max <- max(x_ticks)
y_ticks_min <- min(colMins(data.matrix(df_yr_econ)))
y_ticks_max <- max(colMaxs(data.matrix(df_yr_econ)))

breaks = 10**c(0:10)
labs = c("1","10","100","1K","10K","100K","1M","10M","100M","1B","10B")

plot <- ggplot(data=df_yr_econ) + 
          geom_line(aes(x=x_ticks,y=INCIDENTS,group=1,linetype='INCIDENTS'),color="black") +
          geom_point(aes(x=x_ticks,y=INCIDENTS,group=1),color="black",size=0.8) +            
          geom_line(aes(x=x_ticks,y=DAMAGE_PROPERTY,group=1,color='DAMAGE_PROPERTY')) +
          geom_point(aes(x=x_ticks,y=DAMAGE_PROPERTY,group=1,color='DAMAGE_PROPERTY'),size=0.8) +
          geom_line(aes(x=x_ticks,y=DAMAGE_CROPS,group=1,color='DAMAGE_CROPS')) +
          geom_point(aes(x=x_ticks,y=DAMAGE_CROPS,group=1,color='DAMAGE_CROPS'),size=0.8) +
          scale_x_continuous(breaks=seq(x_ticks_min,x_ticks_max,1)) +          
          scale_y_log10(breaks = breaks, labels = labs) +          
          xlab('Year') + ylab('Amount (USD)') +
          ggtitle("Severe Weather Events: Economic Impact by Year (1997-2016), San Diego County") +    
          scale_color_manual(name = "", values=c('DAMAGE_PROPERTY'='lightsalmon','DAMAGE_CROPS'='lightblue'), 
                             labels = c("Crop Damage","Property Damage")) +
          scale_linetype_manual(name="",values=c('INCIDENTS'='dashed'),labels=c("Incidents"))
Economic Impact by Year (1997-2016)
In [30]:
t(df_yr_econ)
plot + plot_options
19971998199920002001200220032004200520062007200820092010201120122013201420152016
DAMAGE_PROPERTY1750000 10478500 2172000 1902500 13577500 32027000 11368650007986100 75831200 343500 706345600 1557000 215000 2770000 98000 431000 7357000 7788500 323000 4070000
DAMAGE_CROPS 0 7200000 0 845000 0 10550000 56510000 0 0 0 160200000 610000 0 0 0 0 0 0 0 0
INCIDENTS 22 46 63 59 88 90 78 82 102 32 78 62 66 23 15 37 33 55 46 26
In [44]:
# filter out data specific to wildfires
df_wildfire <- df[df$EVENT_TYPE == 'Wildfire',]

# group by year
df_wf_yr <- split(df_wildfire,as.factor(df_wildfire$YEAR))

df_wf_yr_social <- as.data.frame(do.call(rbind,sapply(df_wf_yr,SocialImpact,simplify=FALSE)))
names(df_wf_yr_social) <- c("DEATHS","INJURIES","INCIDENTS")
df_wf_yr_social$YEAR <- rownames(df_wf_yr_social)

df_wf_yr_econ <- as.data.frame(do.call(rbind,sapply(df_wf_yr,EconomicImpact,simplify=FALSE)))
names(df_wf_yr_econ) <- c("DAMAGE_PROPERTY","DAMAGE_CROPS","INCIDENTS")
df_wf_yr_econ$YEAR <- rownames(df_wf_yr_econ)

# merge the data
df_wf_yr <- merge(df_wf_yr_social, df_wf_yr_econ)

# order the data by year (do this before computng percentages)
df_wf_yr <- df_wf_yr[order(df_wf_yr$YEAR),]

# compute the percentages of key indicators that wildfires accounted for, each year
df_wf_yr$INCIDENTS_PCT <- round((df_wf_yr$INCIDENTS/df_yr_econ$INCIDENTS)*100,2)
df_wf_yr$DEATHS_PCT <- round((df_wf_yr$DEATHS/df_yr_social$DEATHS)*100,2)
df_wf_yr$INJURIES_PCT <- round((df_wf_yr$INJURIES/df_yr_social$INJURIES)*100,2)
df_wf_yr$DAMAGE_PROPERTY_PCT <- round((df_wf_yr$DAMAGE_PROPERTY/df_yr_econ$DAMAGE_PROPERTY)*100,2)
df_wf_yr$DAMAGE_CROPS_PCT <- round((df_wf_yr$DAMAGE_CROPS/df_yr_econ$DAMAGE_CROPS)*100,2)

# re-order the columns
df_wf_yr <- df_wf_yr[,c(c(2:6),1,c(7:11))]

# remove row indices 
rownames(df_wf_yr) <- NULL
In [45]:
options(repr.plot.width = 6, repr.plot.height = 5)

df_wf_yr[df_wf_yr == 0] <- 0

x_ticks <- df_wf_yr$YEAR
x_ticks_min <- min(x_ticks)
x_ticks_max <- max(x_ticks)
y_ticks_min <- min(colMins(data.matrix(df_wf_yr)))
y_ticks_max <- max(colMaxs(data.matrix(df_wf_yr)))

breaks = 10**c(0:10)
labs = c("1","10","100","1K","10K","100K","1M","10M","100M","1B","10B")

plot <- ggplot(data=df_wf_yr) + 
          geom_col(aes(x=x_ticks,y=INCIDENTS),width=0.8,fill='black',alpha='0.1') +   
          geom_line(aes(x=x_ticks,y=DEATHS,group=1,color='DEATHS'),size=0.8) +
          geom_point(aes(x=x_ticks,y=DEATHS,group=1,color='DEATHS'),size=0.9) +
          geom_line(aes(x=x_ticks,y=INJURIES,group=1,color='INJURIES'),size=0.8) +
          geom_point(aes(x=x_ticks,y=INJURIES,group=1,color='INJURIES'),size=0.9) +          
          geom_line(aes(x=x_ticks,y=DAMAGE_PROPERTY,group=1,color='DAMAGE_PROPERTY'),size=0.8) +
          geom_point(aes(x=x_ticks,y=DAMAGE_PROPERTY,group=1,color='DAMAGE_PROPERTY'),size=0.9) +
          geom_line(aes(x=x_ticks,y=DAMAGE_CROPS,group=1,color='DAMAGE_CROPS'),size=0.8) +
          geom_point(aes(x=x_ticks,y=DAMAGE_CROPS,group=1,color='DAMAGE_CROPS'),size=0.9) +
          scale_x_discrete(breaks=seq(x_ticks_min,x_ticks_max,1)) +          
          scale_y_log10(breaks = breaks, labels = labs) +          
          xlab('Year') + ylab('') +
          ggtitle("Wildfires: Socio-Economic Impact by Year (1997-2016), San Diego County") +    
          scale_color_manual(name = "", 
                             values=c('DEATHS'='firebrick','INJURIES'='lightsalmon','DAMAGE_PROPERTY'='darkseagreen',
                                      'DAMAGE_CROPS'='deepskyblue3'), 
                             labels = c("Crop Damage", "Property Damage","Deaths","Injuries")) +
          scale_linetype_manual(name="",values=c('INCIDENTS'='dashed'),labels=c("Incidents"))
Impact of Wildfires by Year (1997-2016)

Since wildfires have a disproportionately large impact on the region it is useful to take a closer look at it. This table indicates high impact nature of wildfires in that a few incidents can result in widespread death and destruction.

As before, the bars represent the number of wildfire incidents in each year.

In [46]:
t(df_wf_yr[,c(1,6:11)])
plot + plot_options
YEAR1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
INCIDENTS 1 1 19 9 20 34 18 9 4 1 15 2 1 1 1 2 2 8 1 2
INCIDENTS_PCT 4.55 2.17 30.16 15.25 22.73 37.78 23.08 10.98 3.92 3.12 19.23 3.23 1.52 4.35 6.67 5.41 6.06 14.55 2.17 7.69
DEATHS_PCT 0.00 0.00 7.14 8.33 0.00 0.00 76.19 NA 0.00 0.00 76.92 NA NA 0.00 NA NA NA 0.00 NA 50.00
INJURIES_PCT 0.00 0.00 27.27 43.28 7.59 54.92 59.09 31.03 40.00 21.74 97.16 18.75 0.00NA 94.74100.00100.00100.00NA 0.00
DAMAGE_PROPERTY_PCT100.00 0.00 44.20 0.00 88.75 90.81 99.85 6.89 5.80 0.00 90.25 6.42 0.00 0.00 25.51 92.81 98.55 99.12 0.00 98.28
DAMAGE_CROPS_PCTNA 0.00 NA 0.00 NA 18.96 50.43 NA NA NA 26.53 0.00 NA NA NA NA NA NA NA NA

Insights

These are a few of the insights gathered from the analysis above:

  • Wildfires are the most impactful event both in terms of their social and economic impact. They are also dis-proportionately higher in impact compared to every other severe weather event experienced by the region accouting for ~90% of all damage to property, ~50% of all injuries, ~30% of all deaths and damage to crops over the last 20 years spanning 1997-2016.
  • Only 5 events (in the last 20 years) accounted for over 70% of all deaths and injuries and over 90% of all damage to property
  • High impact events like the wildfire are capable of causing widespread death and destruction even with a few incidents. Years 2003 and 2007 highlight this quite clearly. In both cases, wildfires accounted for over 70% of all deaths, over 90% of all property damage, between 60-97% of injuries and 25-50% of crop damage even as the they account for less than 25% of all severe weather events experienced by the region.
  • Stated another way, incidents such as the Flash Flood while occuring as frequently as Wildfires inflict anywhere from 1-10% of the death and destruction of the latter.
  • Some events such as floods predominantly result in economic impact while others mostly affect the population (e.g.: Rip Currents)

Outro

This is by no means an exhaustive analysis. Many possibilities yet remain - investigating the cyclicality of event occurences, as well as that of the event impact (e.g.: an event may have a high impact one year followed by several years with low impact), assessing the effect of one event on another (e.g.: high impact wildfires are often followed by landslides, this is especially true in Southern California where the wet season follows right after the Santa Ana Winds, the time when fires are most likey to spread) and so on.