Data Auditing Blog View Mantra V5

Photo: Prat Moghe
 

Prat Moghe was the founding CEO of Tizor and led the company from 2002 to 2006 including driving the launch of its product into the data auditing market. Prat led Tizor through two financing rounds and established its security and compliance market strategy.
Read More »

Subscribe By Email

Your email:

Keepers

Current Articles | RSS Feed RSS Feed

Data Breaches – How is data lost?

 | Digg digg it | Reddit reddit | del.icio.us del.icio.us 

This is second in a series of posts on data breaches. Last time I sounded fatalistic on big data breaches. Here I am attempting to understand how data breaches happen. As it turns out, there is a lot of hype and confusion on this question. We have all heard the 100 million data disclosures, and by now, most of us have received multiple notifications from places that lost our data.

Can we answer a simple question – when enterprises lose data, how do they lose it? To answer this, I took the raw statistics compiled by Privacy Rights Clearinghouse. This site records the publicly known incidents of data breach. Attrition has gone further and recorded this in an excel spreadsheet. My research team (thanks: Vai Osborne – my cheeky office life-support & Stephanie Hartenstein) cranked through this spreadsheet and filtered it down to incidents where there is a clear record of how data was lost. We assume that data loss could fall into four categories –

  1. Email - someone emailed lots of sensitive data in an unauthorized way from inside an enterprise
  2. Tape – a tape with sensitive data got lost, perhaps from the back of a truck
  3. Laptop – someone lost a laptop that had sensitive data
  4. Database – someone accessed or hacked into a data server (database or file server) that stores lots of sensitive data.

So our question becomes – what percentage of data loss incidents fall into these four categories? Before you read further, please stop here and do a fun experiment. Come up with your guess on how frequently you think these four incidents occur.

Here is what we found –

1. Data breach incidents – When we count the total number of data breach incidents to date (2004-2007) we find a total of 318 filtered incidents where data loss or data source was clearly recorded. Of these incidents, we find that laptops make the highest frequency (47% - 149 incidents), databases next (40% - 126 incidents), tapes (11%) and email last (2%).

2. Data loss exposure – When we quantify data breaches by the amount of data lost in each breach – we call this the data loss exposure - the ranking changes somewhat. Of a total of roughly 127 Million data losses, databases rank first (64% - 84 Million), laptops next (25% - 32 Million), tapes (10%), and then email (1%). Note that data loss exposure is fairly approximate – enterprises are still not widely monitoring incidents for data exposure. Even then, this number is quite revealing.

3. Data theft risk - Data loss does not necessarily mean theft, but each theft has to begin with a loss. If we want to calculate the risk of data theft, we need to factor in the probability that the lost data actually falls into bad hands, or in other words, establish that there was “intent” to steal after a data loss incident. To measure data theft risk, we have no data to rely on: we will literally have to make it up. In the attached spreadsheet, I arbitrarily assign this risk to 60% for email, 60% for data base, 20% for tape, 20% for laptop. I think this is reasonable – an internal user emailing or a user hacking into a database seem to have much more willful intent, than a tape or laptop getting lost in the general population. (I once lost a laptop on a plane. I am willing to bet it was cleaned out and sold for parts. The chances it was stolen by someone for data has low odds; certainly less than 20% which I conservatively assumed above.) If you don’t agree with my theft risk, feel free to play with these numbers in the spreadsheet. Multiplying data exposure and theft risk, we find that data theft ranks in the following sequence – first databases by an overwhelming measure (84%), followed by laptops (11%), tapes (4%), and email (1%).

These three charts are shown together in the graph below.


Conclusion: Data breaches are not all equal. The source of data breaches matters.

Contrary to general belief, top two sources of data breaches are databases and laptops: probably in that order – not email or tapes.

What do you think? [I have posted the spreadsheet with this data here, for your use.]

Posted by Prat Moghe on Mon, Mar 19, 2007 @ 04:50 PM

COMMENTS

This is fascinating research, Prat, and a very innovative approach to the analysis. I suspect it could cause many organizations to question where they're putting their security emphasis. Equipment theft is the most obvious vulnerability, but as you point out, database access has more serious consequences. Glad to see 33 diggs on this post. It deserves attention.

posted on Wednesday, March 21, 2007 at 7:20 AM by Paul Gillin


This is indeed an interesting take and I agree with much of it. We in the news media tend to focus on the glamorous and exciting data breaches, but it's the commonplace ones that are the biggest risk. It's the companies that invest $25 million in the latest firewall tactics that get done in by a temp in the accounting department who logs in and then walks away from her computer.
The only concern I'd voice about this analysis is the assumption that the underlying stats are sound and are representative. Without suggesting that they are not indeed sound, such a small percentage of data breaches are reported that I'm dubious about how representative they are. That said, it's about the only thing that can be analyzed.

posted on Wednesday, March 21, 2007 at 11:01 PM by Evan Schuman


Thanks Paul - I am hoping to add color to this when I interview security officers to get their point of view on this.

posted on Thursday, March 22, 2007 at 9:12 AM by Prat Moghe


Evan - I have been tracking your recent coverage of TJX. It is interesting where the clues are taking us on that one. For those who havent checked it out yet, the full story is a must read - http://storefrontbacktalk.com/story/032007TJXflorida.php

posted on Thursday, March 22, 2007 at 9:16 AM by Prat Moghe


Great analysis of the situation Pratt, however how do we account for Data Loss Exposure and the difference in correlating damage in terms on dollars...? While e-mail ranks as a distant fourth in comparison to the other data breach methods, this is lacking consideration of IP and confidential information leaks that lead to insider trading and other market manipulations that could result in big dollar gains for criminals that will never come to light...

posted on Thursday, June 07, 2007 at 1:10 PM by Matt Bateman


Matt, you raise a good point around disclosures where risk is not necessarily proportional to the "size" of the disclosure. For ex. the risk of coke secret formula could outweigh a million SSNs. I'll make two observations around this - 1. when it comes to high value "secret" leaks with intent, i believe that predominant channel will be out of band. For example, an employee doing lunch with a competitor or worse actually joining the competitor. I dont think email is the dominant channel for this primarily because of the easy traceability. I am not arguing against monitoring email - there could be lower value confidential email leaks that may violate policies or even be illegal that should be caught. But the real question is how bad is this risk in the large scope of things? 2. High value secrets are in most cases safeguarded in "vaults" ex. database or fileservers - so this brings us back to the need to monitor and alert on all access to such secrets.

posted on Friday, June 08, 2007 at 9:58 AM by Prat Moghe


Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics

Receive email when someone replies.