Statistical Privacy
I just stumbled across an interesting observation by Peter Eckersley, who blogs for the EFF. Apparently, he points out, statistically, you can identify an individual with no more than thirty-three data points. Thirty-three pieces of information… and often quite a bit less. (Obviously, some pieces of information are more useful than others. Billionaires? Lots. Male billionaires? Many. Male billionaires in the IT industry? A half-dozen or so, from what I can tell. Russian-born male billionaires in the IT industry? Probably just one.)
It has interesting privacy implications in an era of surveillance and data-mining. Consider: under some circumstances, just visiting a website, once, can disclose:
What ISP you use, and where you live, down to the city or area level (two data points);
What browser you use, what operating system you use, and possibly what kind of computer hardware you have (three more data points);
What language(s) you speak (one more data point);
What you’re interested in (one further data point);
How you reached that website (one last data point).
That’s eight data points – a quarter of the maximum required to identify you – just from a webserver log.
Read the rest of this entry »