Friday, May 6, 2016

Talking about 0days and Attacks from weird Datasets

The below paper uses Symantec's WINE dataset to draw conclusions about the prevelance of 0days. It is bad in many ways, but in particular it confuses binaries with 0day (which are more related to vulnerabilities), uses a simplistic "windows of vulnerability" model, and uses the WINE dataset to try to derive real data from. Yet people quote from this paper in policy meetings as if it made sense!
https://users.ece.cmu.edu/~tdumitra/public_documents/bilge12_zero_day.pdf

A brief word about the WINE dataset and datasets like it: It is impossible to remove massive observer bias from them. All I want you to do is read the above paper and ask yourself "If the most used 0day on the market was in Symantec's endpoint protection, what would this paper look like?" A good rule of thumb is that if someone is talking about "Windows of vulnerability" they have oversimplified the problem beyond recognition.

What you get with people who rely on IDS data to talk about 0days is a bizarre level of cognitive dissonance when it comes down to how bad their data is for the conclusions they are trying to draw. The only valid thing you can say from that kind of data is "sometimes we get lucky and find an 0day". And the same thing is true when looking at the Verizon data to try to understand attacks. Their conclusions this year are demonstrably nonsensical, but every year has been the same basic methodology...

This is a must read: http://blog.trailofbits.com/2016/05/05/the-dbirs-forest-of-exploit-signatures/

I am sad that research is hard but please stop saying you understand attacks from data that makes no sense.

No comments:

Post a Comment