On Thu, Jul 06, 2023 at 08:08:05PM -0500, Michael Catanzaro wrote: > But remember we do not want to keep information about individuals in the > data set in the first place. It's easier to dodge privacy concerns if we > just don't store such associations at all. Sure, but the data still needs to leave a user's system at some point and that's where you have to trust the aggregator (the Fedora project in this case, I suppose) that it's not stored verbatim. Or, apply a DP technique locally, before it leaves the system. Randomized response, which you mentioned, is actually one such technique. In a way, you already trust the distribution by the very nature of it, e.g. the signatures in packages you install. DP just provides a framework in which you can formally quantify the risk of de-masking an individual user from a given data set, and concrete strategies to employ to minimize that risk. Actually this exact problem is discussed in the blog post series I shared, specifically in this part: https://desfontain.es/privacy/local-global-differential-privacy.html > As for differential privacy, I'm quite unfamiliar with this topic so I don't > know to what extent it could be useful, but Endless is interested in adding > randomized response [1], where say 50% of the data sent is fake and the > other half is accurate. This only works for boolean and possibly integer > data, but it would make it even harder to deanonymize reporterd data. But > that is not supported yet. Indeed, randomized response is one of the DP-aware techniques (it's also mentioned in that blog series) :) And RAPPOR is basically just randomized response but generalized to arbitrary strings (using this fancy thing called Bloom filters [1]). > I will add that to my reading list. Certainly it seems a lot less > intimidating than the Wikipedia article. ;) Yup, the Wikipedia article isn't very helpful. There are much better resources, including a bunch of talks on YouTube from the researchers themselves (e.g. Cynthia Dwork). > Wow. I'll add this to my reading list too, although remains to be seen > whether I'll be able to understand it. :D Yeah, the RAPPOR paper is an interesting read but pretty dense and math-heavy (although not as much as it might seem at first glance). I did *try* to read it at some point and actually managed to understand the key concepts which aren't *that* complicated. But I can't blame anybody for not wanting to go down that path after they skim through it and see those formulas and charts, really :D I went into this DP rabbit hole myself when I was working on the DNF Countme [2] implementation a few years back, and even if it wasn't directly applicable in the end, it did inspire me to add a form of "randomized response" there, to spread the countme events from a single system randomly across a week's time window so that no usage patterns of that particular system (e.g. the typical uptime hours) could emerge if someone were to inspect the HTTP requests with the countme flag coming from the same system aggregated over a long period of time. Pretty theoretical and, in retrospect, rather unlikely and paranoid, but it was easy to add that logic so I did, just for the peace of mind :) I haven't kept up with the latest developments in DP since then, though, and have blissfully forgotten most of it, too. But it sparked my interest back then and I certainly thought that if Fedora ever decides that it wants some kind of "telemetry", *this* is the (only acceptable) way to do it. Which doesn't mean there aren't other ways, or that the approach taken by Endless (which you'd like to adopt) is wrong, of course. These were just my 2 cents :) FWIW, it seems like various tech companies and software project make use of DP (at least that's what the Wikipedia article claims). Google Chrome and MS Windows are among those, amusingly, despite their reputation. [1] https://en.wikipedia.org/wiki/Bloom_filter [2] https://fedoraproject.org/wiki/Changes/DNF_Better_Counting -- Michal Domonkos / RPM dev team / Red Hat, Inc. _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue