On Fri, Jul 7 2023 at 09:21:15 PM -0400, Demi Marie Obenour
<demiobenour@xxxxxxxxx> wrote:
For metrics to not be personally identifiable, it is necessary that
the
set of metrics collected have sufficiently low entropy that on
average,
_many_ users will send _the exact same metrics_. It is very hard for
me
to see any useful set of metrics having such low entropy.
If Fedora has 2 million users (possibly an overestimate) then the
metrics would need to have entropy much less than 2^21, which means
that the entire metrics set would need to be able to be represented
as a 20-bit integer. In practice, I suspect one would need to fit
the entire set in a 16-bit integer or less, and possibly
_significantly_ less.
We're not going to build creepy user profiles. Particular metrics will
be stored individually, not correlated together.
Let's say we have two metrics:
Key | Value
------------
User launched GNOME Builder today? | y/n
User has NVIDIA proprietary driver | y/n
We would know how many users launched Builder and how many users have
NVIDIA graphics, but we wouldn't know how many NVIDIA users launched
Builder because there's just no need to tie those two data points
together.
Michael
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue