ioc.exchange is one of the many independent Mastodon servers you can use to participate in the fediverse.
INDICATORS OF COMPROMISE (IOC) InfoSec Community within the Fediverse. Newbies, experts, gurus - Everyone is Welcome! Instance is supposed to be fast and secure.

Administered by:

Server stats:

1.6K
active users

Matthew Green

A thing I worry about in the (academic) privacy field is that our work isn’t really improving privacy globally. If anything it would be more accurate to say we’re finding ways to encourage the collection and synthesis of more data, by applying a thin veneer of local “privacy.”

I’m referring to the rise of “private” federated machine learning and model-building work, where the end result is to give corporations new ways to build models from confidential user data. This data was previously inaccessible (by law or customer revulsion) but now is fair game.

A typical pitch here is that, by applying techniques like Differential Privacy, we can keep any individual user’s data “out of the model.” The claim: the use of your private data is harmless, since the model “based on your data” will be statistically close to one without it.

The problem with this claim is that it’s misleading to ordinary people. The sniff test is easy: if *my* data doesn’t matter for constructing the model, why do you need it?

And the answer is that your data is essential to building the model.

What’s actually happening here is that the model relies on collecting statistical aggregate data from collections of people. It may not know that you’re a person who likes eg specific types of private media. But if there are others like you, it learns to recognize you as a group.

Building these models fundamentally depends on access to extremely private data, like your browsing habits or location data or purchasing history. This is the kind of stuff you think is private, and even US companies used to think twice about selling.

Adding a veneer of cryptographic privacy to these systems makes the medicine go down easier. Your credit card company might not sell your raw purchase data. But apply magical “privacy pixie dust” so no individual data affects the result? Now the data can be monetized. It’s a very compelling story, and one that’s proliferating.

The critical point here is that the resulting models are a *massive* privacy threat to users. Your data might not reveal your embarrassing or deeply private preferences. But the resulting model might be able to determine exactly what you like. Its existence destroys your privacy.

This is happening today, with many tech firms deploying differential privacy and federated learning to dig deeper into user data and build models of their users’ behavior. It’s not all transparently “evil” but none of it is arguably good for users’ privacy either.

In any case, this doesn’t feel like an ethical issue that academic research has really grappled with. There are individual researchers who admit the problem, but nobody seems to be looking at the overall result and saying “wait, we’re doing a bad thing. Should we stop?”

Maybe that’s just the way research is. We all love our research too much to ever stop doing it, no matter the consequences. I’ll cop to this too.

I just wish we would all take a break for a minute and have a conversation about whether we’re leaving this world better than we found it. //

@matthew_d_green it sounds like you do not think the differential privacy threat model is strong enough. Do you have a alternative threat model in mind?