50K: The Magic Data Loss Number
If you haven't already seen it, the ITRC (Identity Theft Resource
Center) put out their year-end press release. It noted that 2007 was a
year of record breaches. They concluded that 2007 had 443 breaches with
127MM losses, vs. 315 breaches and 20MM losses in 2006. This means 40%
growth in breaches between 2006 and 2007.
After reviewing the ITRC data, I started digging through public
records. I wanted to understand if there was some pattern here and what
it meant for security practitioners. The more I dug, the more
interesting it got. I am still digging and analyzing, but here are some
preliminary findings. As always, I'd like to get your take.
I used the Attrition database, which reports slightly different numbers than ITRC. My analysis spreadsheet is posted here,
feel free to download and double-check my math. To start with, the raw
numbers look alarming. Based on the Attrition data, raw data losses are
163MM (2007) vs. 46MM (2006) vs. 55MM (2005). Also, the numbers of
incidents are 330 (2007) vs. 346 (2006) vs. 137 (2005).
This data is interesting, but I was curious as to how the number of
losses was distributed. As we know, the large loss events cause more
notification problems & expense, as well as embarrassment and brand
risk. When I plotted the loss distributions across the three years,
what jumped out was that there were very few large incidents that
completely skewed the statistics. Take a look at Figure 1 vs. Figure 2.
Figure 1 shows loss distribution across all incidents. It is clear that
events with more than 10MM losses dominate the distribution in Figure
1. Next, I decided to filter the picture and take out all events with
more than 1MM losses. Why 1 MM? This is an arbitrary cutoff, but it
feels natural that any incident with more than a million losses will
have sufficient punitive and visibility factors to cause a structural
impact on the enterprise that suffers it. Lets call such events "large
loss incidents" versus incidents that have less than one million losses
("moderate or damped loss incidents"). If we filter Figure 1, and
remove all large loss events, we end up with Figure 2 - the moderate
loss distribution. I ran some stats on both these distributions and
here is what I found.


Moderate loss incidents vs. Large Loss incidents: The vast
majority of incidents fall into the moderate losses category in Figure
2. Over 97% of incidents are moderate in size! In fact, of the 813
incidents over the past three years, only 21 incidents are large loss
events -- over 1 MM losses. Further, only 4 incidents have crossed 10
MM mark - perhaps the catastrophic large loss threshold. Why is this
important? Because if the vast majority of events are moderate in size,
this is going to influence the security spend in the field. Large
events can cause media hype, but if they are so rare, they may have
little operational impact on security. By the way, the number of large
events have gone up from 5 (2006) to 10 (2007), but I don't think this
is a meaningful enough number to change the perception that large
events are rare.
Moderate losses have not increased much year to year: The
total number of moderate losses has not increased a whole lot - 12MM
(2007) vs. 14MM (2006) vs. 6MM (2005). In fact, the total number of
such losses in 2007 was lower than that in 2006.
50K - The Magic Loss Number: One final question is - how much
data do we lose in each moderate loss incident? It turns out that the
average loss per moderate loss incident is roughly constant! Yes -
across all three years - it is roughly 50,000 losses per incident.
(Precisely, this loss was 55K (2005) vs. 50K (2006) vs. 45K (2007)). It
is premature to call this a "loss constant", but it feels like there is
something structural behind this. By the way, the moderate loss
distribution has a very high variance- the standard deviation is
typically way over 100,000. This means that moderate loss events have
either very little loss, or high loss - they are not necessarily
clustered around the average.
In my next post we will take a look at the probability and financial
risk of a breach. In the meantime, take a look at these numbers and let
me know what you think.