Socio-Economic Impact Analysis of Severe Weather Events (1997-2016)¶

This is an analysis of the socio-economic impact of severe weather events in the San Diego county. This analysis is carried out for the last 20 years starting 1997 and ending in 2016. The analysis aims to answer questions specified under Data Questions below. Results from the analysis can be found under Insights.

Data Source¶

All of the data used in this analysis is sourced from the NOAA (National Oceanic and Atmospheric Administration). This data can be downloaded by either of the methods below. Each years data was downloaded as a seperate CSV (gzipped) file.

Direct Link
Follow links from the NOAA main page as specified below

NOAA -> Frequently Accessed Tools & Resources -> Climate Data & Reports -> Find Climate Data Online -> Storm Events (found under Additional Data Access on the same page) -> Bulk Data Download -> HTTP

The code file for the dataset explaining the fields can be found here.

Data Questions¶

Socio-Economic impact refers to the impact in terms of the population, such as deaths and injuries, and in terms of damage to material assets such as property and crops. With this analysis we are attempting to quantify this impact. Specifically, we are looking to answer the following questions:

Which events had the most social impact?
Which events had the most economic impact?
How has the socio-economic impact of severe weather events changed over the years?

Data Analysis¶

Below are R scripts and commetaries detailing the analysis carried out on the dataset.

#suppress warnings related to R versions among others
options(warn=-1,scipen=999)
#do this to turn warnings back on
#options(warn=0)

Dependencies¶

The raw data files are expected to be in a directory named data under the current workspace.

# load the necessary packages
suppressPackageStartupMessages(require(R.utils,quietly = TRUE))
suppressPackageStartupMessages(require(car,quietly = TRUE))
suppressPackageStartupMessages(require(tools,quietly = TRUE))
suppressPackageStartupMessages(require(ggplot2,quietly = TRUE))
suppressPackageStartupMessages(require(matrixStats,quietly = TRUE))
suppressPackageStartupMessages(require(repr,quietly = TRUE))
suppressPackageStartupMessages(require(reshape2,quietly = TRUE))
suppressPackageStartupMessages(require(scales,quietly = TRUE))
suppressPackageStartupMessages(require(R2HTML,quietly = TRUE))

CWD <- getwd()

DATADIR <- paste(CWD,"data",sep="/")
# create a temp directory to store figures generated here
path.figures <- paste(basename(tempdir()),"figure","",sep = "/")

Pre-Processing¶

We read in data for each year, filter it out for the county of interest (in this case, San Diego) and collate the data from each of the files for the 20 year period from 1997 to 2016 into a single data frame.

BASE_FILE_PREFIX <- "StormEvents_details-ftp_v1.0"

TARGET_YR_START <- 1997
TARGET_YR_END <- 2016
target_years <- TARGET_YR_START:TARGET_YR_END

# these are the fields we care about and their datatypes; also specified in the documentation
target_cols <- c("STATE","STATE_FIPS","YEAR","EVENT_TYPE","CZ_TYPE","CZ_FIPS",
                 "CZ_NAME","INJURIES_DIRECT","INJURIES_INDIRECT","DEATHS_DIRECT",
                 "DEATHS_INDIRECT","DAMAGE_PROPERTY","DAMAGE_CROPS")
                 
target_cols_class <- c("character","numeric","numeric","character","character","numeric",
                       "character","numeric","numeric","numeric",
                       "numeric","character","character")

# FIPS code for San Diego
SD_COUNTY_FIPS <- 73
SD_COUNTY_NAME <- "SAN DIEGO"

# this function takes in the year (e.g.: 2016) and reads data from the appropriate datafile
# returning the CSV data, filtered out for the San Diego county, as a data frame
ExtractCSV <- function(yr)
{
    datafile_prefix <- paste(BASE_FILE_PREFIX,yr,sep="_d") 
    datafile <- grep(datafile_prefix,dir(DATADIR),value=TRUE)
    datafile <- paste(DATADIR,datafile,sep="/")
    
    # check if file is not already uncompressed
    if(file_ext(datafile)=="gz")
    {
        gunzip(datafile)
        csvfile <- strsplit(datafile,".gz")[[1]]
    }
    else
        csvfile = datafile
    
    # create a connection with the positions of the cols we care about
    col_pos <- match(target_cols,scan(csvfile,sep=',',what=character(),nlines=1)) 
    con <- pipe(paste("cut -d, -f",paste(col_pos,collapse=',')," ",csvfile,sep='')) 
    
    # read the data
    csvdata <- read.csv(con,stringsAsFactors = FALSE,allowEscapes = TRUE,strip.white = TRUE)
    
    # filter by county
    df <- csvdata[with(csvdata,grepl("san diego",CZ_NAME,perl=T,ignore.case = T)),]
    #df <- csvdata[(csvdata$CZ_NAME == SD_COUNTY_NAME),]
    return (df)                    
}

df <- as.data.frame(do.call(rbind,sapply(target_years,ExtractCSV,simplify=FALSE)))

head(df)

We inspect the data for missing values and discard any samples that do not have useful information for this analysis.

cat(paste("# Samples:", nrow(df), "\n"))
cat(paste("# Events: ",length(unique(df$EVENT_TYPE)),"\n"))
unique(df$EVENT_TYPE)

# Samples: 1103 
# Events:  35

# check for missing values
ifelse (sum(colSums(is.na(df)))==0,cat("No missing data\n"),print(colSums(is.na(df))))

            STATE        STATE_FIPS              YEAR        EVENT_TYPE 
                0                 0                 0                 0 
          CZ_TYPE           CZ_FIPS           CZ_NAME   INJURIES_DIRECT 
                0                 0                 0                 0 
INJURIES_INDIRECT     DEATHS_DIRECT   DEATHS_INDIRECT   DAMAGE_PROPERTY 
                2                 2                 2                 0 
     DAMAGE_CROPS 
                0

# use only samples where not all properties are missing
good <- !(is.na(df$DEATHS_DIRECT) & is.na(df$DEATHS_INDIRECT) & is.na(df$INJURIES_DIRECT) & is.na(df$INJURIES_INDIRECT) &                is.na(df$DAMAGE_PROPERTY) & is.na(df$DAMAGE_CROPS))
df <- df[good,]
cat(paste("# Samples Remaining:",nrow(df),"\n"))

# Samples Remaining: 1103

Transformations¶

We convert values in the numeric fields to their appropriate data type since they are all read in as character strings.

# convert variables to the appropriate types
for(i in 1:ncol(df))
{
    if(target_cols_class[i] == "numeric")
        df[,i] <- as.numeric(df[,i])
}

Inspecting the DAMAGE_PROPERTY and DAMAGE_CROPS fields and from the code file it is clear that these need to be transformed in order to arrive at the actual damage in terms of US Dollars (USD). We essentially multiply the numerical part of each of these fields by the exponent determined by the character (K is a thousand, M is a million and so on) following the numerical part.

cat(paste("Property Damage (Raw):",sep=" "))
cat(unique(df$DAMAGE_PROPERTY)[1:10])
cat(paste("\n","   Crop Damage (Raw): ",sep=" "))
cat(unique(df$DAMAGE_CROPS)[1:10])

Property Damage (Raw): 1.75M 0 5K 2.5K 10.3M 150K 10K 1K 20K
    Crop Damage (Raw): 0  300K 6.9M 20K 800K 25K 750K 2M 7.8M

# convert property and crop damage values to numeric by multiplying the exponents (K, M, B) with the 
# appropriate multiple of 10; 
ProcessExponent <- function(x)
{
    val <- 0
    exp <- sapply(strsplit(x,""),tail,1)
    num <- as.numeric(gsub(".$","",x))
    if (!is.na(num))
    {
        fac <- ifelse((exp=="K"),10**3,ifelse((exp=="M"),
                                              10**6,ifelse((exp=="B"),10**9,10**0)))
        val <- (num * fac)
    }
    
    return (val)
}

df$DAMAGE_PROPERTY_USD <- sapply(df$DAMAGE_PROPERTY,ProcessExponent)
df$DAMAGE_CROPS_USD <- sapply(df$DAMAGE_CROPS,ProcessExponent)

This is the result of the above transformation.

cat(paste("Property Damage (USD): ",sep=" "))
cat(unique(df$DAMAGE_PROPERTY_USD)[1:10])
cat(paste("\n","   Crop Damage (USD): ",sep=" "))
cat(unique(df$DAMAGE_CROPS_USD)[1:10])

Property Damage (USD): 0 1750000 5000 2500 10300000 150000 10000 1000 20000 200000
    Crop Damage (USD): 0 300000 6900000 20000 800000 25000 750000 2000000 7800000 22000000

Analysis: Q1¶

Since we are attempting to understand the impact by event we first group the data by event type. Social impact pertains to the deaths and injuries in the data. While it is good that the data seperates out the deaths and injuries by whether they were caused directly or indirectly by the event, we are more interested in the overall impact in this analysis.

The output is a list of events that had the greatest impact by deaths, followed by injuries.

# group by event type
df_evt <- split(df,as.factor(df$EVENT_TYPE))

# function to collect and compute information related to population damage
# sum up the deaths and injuries, collapsing direct and indirect counts, the specified input (x)
# also count the number of instances within x
SocialImpact <- function(x)
{
    c((sum(x$DEATHS_DIRECT,na.rm=TRUE) + sum(x$DEATHS_INDIRECT,na.rm=TRUE)),
      (sum(x$INJURIES_DIRECT,na.rm = TRUE) + sum(x$INJURIES_INDIRECT,na.rm = TRUE)),
      nrow(x))
}

df_social <- as.data.frame(do.call(rbind, 
                                sapply(df_evt,SocialImpact,simplify=FALSE)))
names(df_social) <- c("DEATHS","INJURIES","INCIDENTS")

# compute % of total impact
df_social$DEATHS_PCT <- round((df_social$DEATHS/sum(df_social$DEATHS))*100,2)
df_social$INJURIES_PCT <- round((df_social$INJURIES/sum(df_social$INJURIES))*100,2)

# order the weather events by number of deaths first, then by number of 
# injuries in the descending order
order <- order(df_social$DEATHS,df_social$INJURIES,decreasing = TRUE)
df_social <- df_social[order,]

In the table below, the number of deaths and injuries resulting from each event across the entire span is given by DEATHS and INJURIES respectively. Note that these measures are aggregates of both direct and indirect occurences for each impact type. INCIDENTS represents the number of incidents of the event across the entire span (20 years). The percentage of total impact (in deaths and injuries) accounted for by each event is also represented.

head(df_social,n=10)

plot_options <- list(theme(panel.background = element_rect(fill = "transparent", colour = NA),
                          legend.background = element_rect(fill = "transparent", colour = NA),
                          legend.key = element_rect(fill = "white"),
                          axis.line.x = element_line(color="gray", size = 0.5),
                          axis.line.y = element_line(color="gray", size = 0.5),
                          plot.title=element_text(family="Times", face="bold", size=8,color="lightyellow4"),
                          #legend.title = element_text(family="Times", face="bold", size=8),
                          legend.title = element_blank(), 
                          legend.text = element_text(size=7, face="bold"),
                          legend.position = c(0.9,0.9),
                          axis.title.x = element_text(size=8,face="bold",color='brown3'),
                          axis.title.y = element_text(size=8,face="bold",color='brown3'), 
                          axis.ticks = element_line(color='gray'),
                          axis.text.x = element_text(size=8,angle=45, hjust=1,face="bold"),
                          axis.text.y = element_text(size=7,face="bold")))

options(repr.plot.width = 5, repr.plot.height = 4)

# we're only plotting the top 10 events
df_social_topten <- df_social[1:10,1:2] 
# reshape the data frame to suit plotting needs
df_social_topten$event <- rownames(df_social_topten)
df_melt <- melt(df_social_topten,id.vars='event')

# this is needed since ggplot (by default) does not stick to the order 
# in the original data frame
ordered_plot <- function(lvl) 
{
    df_melt$event <- factor(df_melt$event, levels = df_melt$event[order(-df_melt$value[df_melt$variable == lvl])])
    
    ggplot(data=df_melt) + 
           geom_bar(aes(x=event, y=value, fill = variable), width = 0.8, 
                         position = 'dodge', stat="identity") +
           plot_options 
}

plot <- ordered_plot(levels(df_melt$variable)[1]) +
        xlab("Weather Event") + ylab("") +
        ggtitle("Severe Weather Events: Social Impact by Event (1997-2016), San Diego County") +
        scale_fill_manual(name="",values=c('DEATHS'='lightblue','INJURIES'='lightsalmon'), 
                             labels = c("Deaths","Injuries"))
        
plot

Analysis: Q2¶

We carry out similar analysis to output events by their greatest economic impact. Here the list is ordered by damage to property, followed by crops since the San Diego county is a predominantly urban county with property being the dominant asset owned by the population as opposed to crops.

# function to collect and compute information related to crop damage
# sum up the damage collapsing direct and indirect instances for each group
EconomicImpact <- function(x)
{
    c(sum(x$DAMAGE_PROPERTY_USD,na.rm=TRUE), sum(x$DAMAGE_CROPS_USD,na.rm=TRUE),nrow(x))      
}

df_econ <- as.data.frame(do.call(rbind, 
                                sapply(df_evt,EconomicImpact,simplify=FALSE)))
names(df_econ) <- c("DAMAGE_PROPERTY","DAMAGE_CROPS","INCIDENTS")

# order the weather events by number of deaths first, then by number of 
# injuries in the descending order
order <- order(df_econ$DAMAGE_PROPERTY,df_econ$DAMAGE_CROPS,decreasing = TRUE)
df_econ <- df_econ[order,]

# compute % of total impact
df_econ$DAMAGE_PROPERTY_PCT <- round((df_econ$DAMAGE_PROPERTY/sum(df_econ$DAMAGE_PROPERTY))*100,2)
df_econ$DAMAGE_CROPS_PCT <- round((df_econ$DAMAGE_CROPS/sum(df_econ$DAMAGE_CROPS))*100,2)

Economic Impact: Top Ten Events¶

In the table below, damage to property and crops in USD is given by DAMAGE_PROPERTY andDAMAGE_CROPS respectively. INCIDENTS represents the number of incidents of the event across the entire span (20 years). The percentage of total damage (in property and crops) accounted for by each event is also represented.

head(df_econ,n=10)

options(repr.plot.width = 6, repr.plot.height = 5)

# we're only plotting the top 10 events
df_econ_topten <- df_econ[1:10,1:2]

# reshape the data frame to suit plotting needs
df_econ_topten$event <- rownames(df_econ_topten)
df_melt <- melt(df_econ_topten,id.vars='event')

# do this since we are using log scale (this way we 
# prevent log(0) from being plotted)
df_melt[df_melt == 0] <- NA

breaks = 10**c(0:10)
labs = c("1","10","100","1K","10K","100K","1M","10M","100M","1B","10B")
plot <- ordered_plot(levels(df_melt$variable)[1]) +
        scale_y_log10(breaks = breaks, labels = labs) +
        theme(axis.text.x = element_text(size=8,angle=0, hjust=0.5,face="bold")) +
        coord_flip() +
        xlab("Weather Event") + ylab("Amount (USD)") +
        ggtitle("Severe Weather Events: Economic Impact by Event (1997-2016), San Diego County") +
        scale_fill_manual(name="",values=c('DAMAGE_PROPERTY'='lightblue','DAMAGE_CROPS'='lightsalmon'), 
                             labels = c("Property Damage","Crop Damage"))
        
plot

Analysis: Q3¶

We start here by grouping the data by the year of occurence and subsequently summing them by group.

# group by year
df_yr <- split(df,as.factor(df$YEAR))

df_yr_social <- as.data.frame(do.call(rbind, 
                              sapply(df_yr,SocialImpact,simplify=FALSE)))
names(df_yr_social) <- c("DEATHS","INJURIES","INCIDENTS")

options(repr.plot.width = 6, repr.plot.height = 5)

x_ticks <- as.numeric(rownames(df_yr_social))
x_ticks_min <- min(x_ticks)
x_ticks_max <- max(x_ticks)
y_ticks_min <- min(colMins(data.matrix(df_yr_social)))
y_ticks_max <- max(colMaxs(data.matrix(df_yr_social)))

plot <- ggplot(data=df_yr_social) + 
          geom_col(aes(x=x_ticks,y=INCIDENTS),width=0.8,fill='black',alpha='0.1') +  
          geom_line(aes(x=x_ticks,y=DEATHS,group=1,color='DEATHS'),size=0.8) +
          geom_point(aes(x=x_ticks,y=DEATHS,group=1,color='DEATHS'),size=0.9) +
          geom_line(aes(x=x_ticks,y=INJURIES,group=1,color='INJURIES'),size=0.8) +
          geom_point(aes(x=x_ticks,y=INJURIES,group=1,color='INJURIES'),size=0.9) +          
          scale_x_continuous(breaks=seq(x_ticks_min,x_ticks_max,1)) +
          scale_y_continuous(breaks=seq(y_ticks_min,y_ticks_max,20)) +             
          xlab('Year') + ylab('') +
          ggtitle("Severe Weather Events: Social Impact by Year (1997-2016), San Diego County") +    
          scale_color_manual(name = "", values=c('DEATHS'='lightblue','INJURIES'='lightsalmon'), 
                             labels = c("Deaths","Injuries"))

In the plot below, the incidents are represented by bars for each year. Note that here the number of incidents is the aggregate across all event types.

t(df_yr_social)
plot + plot_options

df_yr_econ <- as.data.frame(do.call(rbind, 
                              sapply(df_yr,EconomicImpact,simplify=FALSE)))
names(df_yr_econ) <- c("DAMAGE_PROPERTY","DAMAGE_CROPS","INCIDENTS")

options(repr.plot.width = 6, repr.plot.height = 5)

df_yr_econ[df_yr_econ == 0] <- 0

x_ticks <- as.numeric(rownames(df_yr_econ))
x_ticks_min <- min(x_ticks)
x_ticks_max <- max(x_ticks)
y_ticks_min <- min(colMins(data.matrix(df_yr_econ)))
y_ticks_max <- max(colMaxs(data.matrix(df_yr_econ)))

breaks = 10**c(0:10)
labs = c("1","10","100","1K","10K","100K","1M","10M","100M","1B","10B")

plot <- ggplot(data=df_yr_econ) + 
          geom_line(aes(x=x_ticks,y=INCIDENTS,group=1,linetype='INCIDENTS'),color="black") +
          geom_point(aes(x=x_ticks,y=INCIDENTS,group=1),color="black",size=0.8) +            
          geom_line(aes(x=x_ticks,y=DAMAGE_PROPERTY,group=1,color='DAMAGE_PROPERTY')) +
          geom_point(aes(x=x_ticks,y=DAMAGE_PROPERTY,group=1,color='DAMAGE_PROPERTY'),size=0.8) +
          geom_line(aes(x=x_ticks,y=DAMAGE_CROPS,group=1,color='DAMAGE_CROPS')) +
          geom_point(aes(x=x_ticks,y=DAMAGE_CROPS,group=1,color='DAMAGE_CROPS'),size=0.8) +
          scale_x_continuous(breaks=seq(x_ticks_min,x_ticks_max,1)) +          
          scale_y_log10(breaks = breaks, labels = labs) +          
          xlab('Year') + ylab('Amount (USD)') +
          ggtitle("Severe Weather Events: Economic Impact by Year (1997-2016), San Diego County") +    
          scale_color_manual(name = "", values=c('DAMAGE_PROPERTY'='lightsalmon','DAMAGE_CROPS'='lightblue'), 
                             labels = c("Crop Damage","Property Damage")) +
          scale_linetype_manual(name="",values=c('INCIDENTS'='dashed'),labels=c("Incidents"))

Economic Impact by Year (1997-2016)¶

t(df_yr_econ)
plot + plot_options

# filter out data specific to wildfires
df_wildfire <- df[df$EVENT_TYPE == 'Wildfire',]

# group by year
df_wf_yr <- split(df_wildfire,as.factor(df_wildfire$YEAR))

df_wf_yr_social <- as.data.frame(do.call(rbind,sapply(df_wf_yr,SocialImpact,simplify=FALSE)))
names(df_wf_yr_social) <- c("DEATHS","INJURIES","INCIDENTS")
df_wf_yr_social$YEAR <- rownames(df_wf_yr_social)

df_wf_yr_econ <- as.data.frame(do.call(rbind,sapply(df_wf_yr,EconomicImpact,simplify=FALSE)))
names(df_wf_yr_econ) <- c("DAMAGE_PROPERTY","DAMAGE_CROPS","INCIDENTS")
df_wf_yr_econ$YEAR <- rownames(df_wf_yr_econ)

# merge the data
df_wf_yr <- merge(df_wf_yr_social, df_wf_yr_econ)

# order the data by year (do this before computng percentages)
df_wf_yr <- df_wf_yr[order(df_wf_yr$YEAR),]

# compute the percentages of key indicators that wildfires accounted for, each year
df_wf_yr$INCIDENTS_PCT <- round((df_wf_yr$INCIDENTS/df_yr_econ$INCIDENTS)*100,2)
df_wf_yr$DEATHS_PCT <- round((df_wf_yr$DEATHS/df_yr_social$DEATHS)*100,2)
df_wf_yr$INJURIES_PCT <- round((df_wf_yr$INJURIES/df_yr_social$INJURIES)*100,2)
df_wf_yr$DAMAGE_PROPERTY_PCT <- round((df_wf_yr$DAMAGE_PROPERTY/df_yr_econ$DAMAGE_PROPERTY)*100,2)
df_wf_yr$DAMAGE_CROPS_PCT <- round((df_wf_yr$DAMAGE_CROPS/df_yr_econ$DAMAGE_CROPS)*100,2)

# re-order the columns
df_wf_yr <- df_wf_yr[,c(c(2:6),1,c(7:11))]

# remove row indices 
rownames(df_wf_yr) <- NULL

options(repr.plot.width = 6, repr.plot.height = 5)

df_wf_yr[df_wf_yr == 0] <- 0

x_ticks <- df_wf_yr$YEAR
x_ticks_min <- min(x_ticks)
x_ticks_max <- max(x_ticks)
y_ticks_min <- min(colMins(data.matrix(df_wf_yr)))
y_ticks_max <- max(colMaxs(data.matrix(df_wf_yr)))

breaks = 10**c(0:10)
labs = c("1","10","100","1K","10K","100K","1M","10M","100M","1B","10B")

plot <- ggplot(data=df_wf_yr) + 
          geom_col(aes(x=x_ticks,y=INCIDENTS),width=0.8,fill='black',alpha='0.1') +   
          geom_line(aes(x=x_ticks,y=DEATHS,group=1,color='DEATHS'),size=0.8) +
          geom_point(aes(x=x_ticks,y=DEATHS,group=1,color='DEATHS'),size=0.9) +
          geom_line(aes(x=x_ticks,y=INJURIES,group=1,color='INJURIES'),size=0.8) +
          geom_point(aes(x=x_ticks,y=INJURIES,group=1,color='INJURIES'),size=0.9) +          
          geom_line(aes(x=x_ticks,y=DAMAGE_PROPERTY,group=1,color='DAMAGE_PROPERTY'),size=0.8) +
          geom_point(aes(x=x_ticks,y=DAMAGE_PROPERTY,group=1,color='DAMAGE_PROPERTY'),size=0.9) +
          geom_line(aes(x=x_ticks,y=DAMAGE_CROPS,group=1,color='DAMAGE_CROPS'),size=0.8) +
          geom_point(aes(x=x_ticks,y=DAMAGE_CROPS,group=1,color='DAMAGE_CROPS'),size=0.9) +
          scale_x_discrete(breaks=seq(x_ticks_min,x_ticks_max,1)) +          
          scale_y_log10(breaks = breaks, labels = labs) +          
          xlab('Year') + ylab('') +
          ggtitle("Wildfires: Socio-Economic Impact by Year (1997-2016), San Diego County") +    
          scale_color_manual(name = "", 
                             values=c('DEATHS'='firebrick','INJURIES'='lightsalmon','DAMAGE_PROPERTY'='darkseagreen',
                                      'DAMAGE_CROPS'='deepskyblue3'), 
                             labels = c("Crop Damage", "Property Damage","Deaths","Injuries")) +
          scale_linetype_manual(name="",values=c('INCIDENTS'='dashed'),labels=c("Incidents"))

Impact of Wildfires by Year (1997-2016)¶

Since wildfires have a disproportionately large impact on the region it is useful to take a closer look at it. This table indicates high impact nature of wildfires in that a few incidents can result in widespread death and destruction.

As before, the bars represent the number of wildfire incidents in each year.

t(df_wf_yr[,c(1,6:11)])
plot + plot_options

Insights¶

These are a few of the insights gathered from the analysis above:

Wildfires are the most impactful event both in terms of their social and economic impact. They are also dis-proportionately higher in impact compared to every other severe weather event experienced by the region accouting for ~90% of all damage to property, ~50% of all injuries, ~30% of all deaths and damage to crops over the last 20 years spanning 1997-2016.
Only 5 events (in the last 20 years) accounted for over 70% of all deaths and injuries and over 90% of all damage to property
High impact events like the wildfire are capable of causing widespread death and destruction even with a few incidents. Years 2003 and 2007 highlight this quite clearly. In both cases, wildfires accounted for over 70% of all deaths, over 90% of all property damage, between 60-97% of injuries and 25-50% of crop damage even as the they account for less than 25% of all severe weather events experienced by the region.
Stated another way, incidents such as the Flash Flood while occuring as frequently as Wildfires inflict anywhere from 1-10% of the death and destruction of the latter.
Some events such as floods predominantly result in economic impact while others mostly affect the population (e.g.: Rip Currents)

Outro¶

This is by no means an exhaustive analysis. Many possibilities yet remain - investigating the cyclicality of event occurences, as well as that of the event impact (e.g.: an event may have a high impact one year followed by several years with low impact), assessing the effect of one event on another (e.g.: high impact wildfires are often followed by landslides, this is especially true in Southern California where the wet season follows right after the Santa Ana Winds, the time when fires are most likey to spread) and so on.

	DEATHS	INJURIES	INCIDENTS	DEATHS_PCT	INJURIES_PCT
Wildfire	30	487	151	28.30	49.74
Rip Current	14	71	22	13.21	7.25
Winter Weather	14	68	12	13.21	6.95
Heat	10	69	29	9.43	7.05
Heavy Snow	8	45	15	7.55	4.60
Flood	5	8	59	4.72	0.82
Winter Storm	4	67	14	3.77	6.84
High Surf	4	56	43	3.77	5.72
Cold/Wind Chill	4	15	6	3.77	1.53
Dense Fog	4	8	61	3.77	0.82

	DAMAGE_PROPERTY	DAMAGE_CROPS	INCIDENTS	DAMAGE_PROPERTY_PCT	DAMAGE_CROPS_PCT
Wildfire	1840920000	73000000	151	91.41	30.94
Landslide	48000000	0	1	2.38	0.00
Heavy Rain	46485000	0	29	2.31	0.00
Flood	24091000	0	59	1.20	0.00
High Wind	23307000	36620000	121	1.16	15.52
Flash Flood	18939500	7510000	144	0.94	3.18
Thunderstorm Wind	5769000	25000	42	0.29	0.01
High Surf	1484000	0	43	0.07	0.00
Debris Flow	1115000	0	12	0.06	0.00
Strong Wind	932000	0	27	0.05	0.00

	1997	1998	1999	2000	2001	2002	2003	2004	2005	2006	2007	2008	2009	2010	2011	2012	2013	2014	2015	2016
DEATHS	16	5	14	12	5	4	21	0	4	5	13	0	0	1	0	0	0	2	0	4
INJURIES	1	20	77	67	145	122	264	29	15	23	141	16	10	0	19	3	12	10	0	5
INCIDENTS	22	46	63	59	88	90	78	82	102	32	78	62	66	23	15	37	33	55	46	26

	1997	1998	1999	2000	2001	2002	2003	2004	2005	2006	2007	2008	2009	2010	2011	2012	2013	2014	2015	2016
DAMAGE_PROPERTY	1750000	10478500	2172000	1902500	13577500	32027000	1136865000	7986100	75831200	343500	706345600	1557000	215000	2770000	98000	431000	7357000	7788500	323000	4070000
DAMAGE_CROPS	0	7200000	0	845000	0	10550000	56510000	0	0	0	160200000	610000	0	0	0	0	0	0	0	0
INCIDENTS	22	46	63	59	88	90	78	82	102	32	78	62	66	23	15	37	33	55	46	26

YEAR	1997	1998	1999	2000	2001	2002	2003	2004	2005	2006	2007	2008	2009	2010	2011	2012	2013	2014	2015	2016
INCIDENTS	1	1	19	9	20	34	18	9	4	1	15	2	1	1	1	2	2	8	1	2
INCIDENTS_PCT	4.55	2.17	30.16	15.25	22.73	37.78	23.08	10.98	3.92	3.12	19.23	3.23	1.52	4.35	6.67	5.41	6.06	14.55	2.17	7.69
DEATHS_PCT	0.00	0.00	7.14	8.33	0.00	0.00	76.19	NA	0.00	0.00	76.92	NA	NA	0.00	NA	NA	NA	0.00	NA	50.00
INJURIES_PCT	0.00	0.00	27.27	43.28	7.59	54.92	59.09	31.03	40.00	21.74	97.16	18.75	0.00	NA	94.74	100.00	100.00	100.00	NA	0.00
DAMAGE_PROPERTY_PCT	100.00	0.00	44.20	0.00	88.75	90.81	99.85	6.89	5.80	0.00	90.25	6.42	0.00	0.00	25.51	92.81	98.55	99.12	0.00	98.28
DAMAGE_CROPS_PCT	NA	0.00	NA	0.00	NA	18.96	50.43	NA	NA	NA	26.53	0.00	NA	NA	NA	NA	NA	NA	NA	NA

	STATE	STATE_FIPS	YEAR	EVENT_TYPE	CZ_TYPE	CZ_FIPS	CZ_NAME	DEATHS_DIRECT	DAMAGE_PROPERTY	DAMAGE_CROPS
4433	CALIFORNIA	6	1997	Heavy Rain	C	73	SAN DIEGO	0		0
4514	CALIFORNIA	6	1997	Heat	Z	62	SAN DIEGO COUNTY DESERTS	0
7791	CALIFORNIA	6	1997	Wildfire	C	73	SAN DIEGO	0	1.75M
8603	CALIFORNIA	6	1997	Funnel Cloud	C	73	SAN DIEGO	0
9648	CALIFORNIA	6	1997	Heat	Z	50	SAN DIEGO COUNTY VALLEYS	1
9649	CALIFORNIA	6	1997	Heavy Rain	C	73	SAN DIEGO	0