Photo: Prat Moghe

Prat Moghe is the Founder & GM of Tizor, who has led Tizor from concept to leadership in the data auditing market. Tizor is now a subsidiary of Netezza (NYSE: NZ), a leader in data warehouse appliances.

Read More ยป

Subscribe By Email

Your email:

Keepers

Data Auditing Blog

Current Articles | RSS Feed RSS Feed

Data Breaches: Making sense of the numbers (Part 1 of 2)

 | Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Submit to StumbleUpon StumbleUpon | Share on Facebook Facebook | Share on Twitter Twitter | Share on LinkedIn LinkedIn 

50K: The Magic Data Loss Number

If you haven't already seen it, the ITRC (Identity Theft Resource Center) put out their year-end press release. It noted that 2007 was a year of record breaches. They concluded that 2007 had 443 breaches with 127MM losses, vs. 315 breaches and 20MM losses in 2006. This means 40% growth in breaches between 2006 and 2007.

After reviewing the ITRC data, I started digging through public records. I wanted to understand if there was some pattern here and what it meant for security practitioners. The more I dug, the more interesting it got. I am still digging and analyzing, but here are some preliminary findings. As always, I'd like to get your take.

I used the Attrition database, which reports slightly different numbers than ITRC. My analysis spreadsheet is posted here, feel free to download and double-check my math. To start with, the raw numbers look alarming. Based on the Attrition data, raw data losses are 163MM (2007) vs. 46MM (2006) vs. 55MM (2005). Also, the numbers of incidents are 330 (2007) vs. 346 (2006) vs. 137 (2005).

This data is interesting, but I was curious as to how the number of losses was distributed. As we know, the large loss events cause more notification problems & expense, as well as embarrassment and brand risk. When I plotted the loss distributions across the three years, what jumped out was that there were very few large incidents that completely skewed the statistics. Take a look at Figure 1 vs. Figure 2. Figure 1 shows loss distribution across all incidents. It is clear that events with more than 10MM losses dominate the distribution in Figure 1. Next, I decided to filter the picture and take out all events with more than 1MM losses. Why 1 MM? This is an arbitrary cutoff, but it feels natural that any incident with more than a million losses will have sufficient punitive and visibility factors to cause a structural impact on the enterprise that suffers it. Lets call such events "large loss incidents" versus incidents that have less than one million losses ("moderate or damped loss incidents"). If we filter Figure 1, and remove all large loss events, we end up with Figure 2 - the moderate loss distribution. I ran some stats on both these distributions and here is what I found.

Moderate loss incidents vs. Large Loss incidents: The vast majority of incidents fall into the moderate losses category in Figure 2. Over 97% of incidents are moderate in size! In fact, of the 813 incidents over the past three years, only 21 incidents are large loss events -- over 1 MM losses. Further, only 4 incidents have crossed 10 MM mark - perhaps the catastrophic large loss threshold. Why is this important? Because if the vast majority of events are moderate in size, this is going to influence the security spend in the field. Large events can cause media hype, but if they are so rare, they may have little operational impact on security. By the way, the number of large events have gone up from 5 (2006) to 10 (2007), but I don't think this is a meaningful enough number to change the perception that large events are rare.

Moderate losses have not increased much year to year: The total number of moderate losses has not increased a whole lot - 12MM (2007) vs. 14MM (2006) vs. 6MM (2005). In fact, the total number of such losses in 2007 was lower than that in 2006.

50K - The Magic Loss Number: One final question is - how much data do we lose in each moderate loss incident? It turns out that the average loss per moderate loss incident is roughly constant! Yes - across all three years - it is roughly 50,000 losses per incident. (Precisely, this loss was 55K (2005) vs. 50K (2006) vs. 45K (2007)). It is premature to call this a "loss constant", but it feels like there is something structural behind this. By the way, the moderate loss distribution has a very high variance- the standard deviation is typically way over 100,000. This means that moderate loss events have either very little loss, or high loss - they are not necessarily clustered around the average.

In my next post we will take a look at the probability and financial risk of a breach. In the meantime, take a look at these numbers and let me know what you think.

Tags: 

Comments

Fantastic post Prat, Thank you. My only comment is regards the 16% chance you apply to a Fortune 5000 company experiencing a breach. The data you so meticulously tabulated, If I'm reading it correctly, indicates in 2007 alone, 32% was a business, 32% an Edu. institution, 29% a Gov entity, and 13% a Medical organization. If we assume that this percentage holds true over the 813 incidents over 3 years, and you add the Med and Biz together, only 46% of the 813 can be applied to Fourtune 5000 companies. This equates to a 7.5% chance a Fortune 5000 company will experience a breach. Plus Attrition.com does include some non US organizations, but 93% are US breaches, so that shouldn't impact the #'s too much. Again, great post.
Posted @ Tuesday, March 11, 2008 1:07 PM by Jason Roberts
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics

Receive email when someone replies.