Re: F42 Change Proposal: Opt-In Metrics for Fedora Workstation (system-wide)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 08, 2024 at 02:28:09PM -0500, Michael Catanzaro wrote:
> On Mon, Jul 8 2024 at 01:51:07 PM -04:00:00, Przemek Klosowski via devel
> <devel@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > At the same time, I ask the proponents to confirm that there will be no
> > way to re-aggregate the data by any means (timestamps, Fedora account
> > cookies, load factor on the server, etc).
> 
> Good question! I *think* timestamps are no longer a problem. It does store
> precise timestamps alongside a hash of the full submission, but it doesn't
> actually store the full submission itself anymore, and the first few tables
> of metrics I've checked do not any contain timestamps. But we do need to
> audit and make sure that if timestamps are stored anywhere else, we must
> reduce their granularity to prevent them from being matched up with
> timestamps from other records. It's probably more than sufficient to know
> that a metric was submitted on a given day, for example; there's just no
> need to know that a record was submitted at any given second. Anyway, that's
> an easy problem.
> 
> Then there are two other problems I can think of:
> 
> 1. You might be able to guess that records are from the same user based on
> the order of the rows in the database. I'm not sure what will be the final
> solution for this. Randomizing the position of new rows would surely avoid
> this problem, but could possibly have performance impact at scale? I'm not
> sure. We'll need to do something about this to keep our promise that it
> should not be possible to correlate records.

Does the table store counts or separate entries? I would guess that if
it just stores disaggregated values, then the values repeat often, and
it's natural to store the count in the table. And then the order
doesn't matter, because it'll be different in different tables.

> 2. Another problem is that malformed records are kept in their entirety so
> the problem can be investigated. A human looking at a malformed record would
> see the aggregated data for a particular user. This should theoretically
> only happen in the event of a bug, but bugs happen. ;) I could also
> hypothetically imagine a system's hardware being so broken as to corrupt
> metrics, yet still somehow manage to boot, for instance. What to do about
> this is an open question. The safest option would be to discard rather than
> store malformed records, at the cost of being unable to investigate and fix
> this class of bugs.

Zbyszek
-- 
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux