On Mon, Jul 8 2024 at 01:51:07 PM -04:00:00, Przemek Klosowski via
devel <devel@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
At the same time, I ask the proponents to confirm that there will be
no
way to re-aggregate the data by any means (timestamps, Fedora account
cookies, load factor on the server, etc).
Good question! I *think* timestamps are no longer a problem. It does
store precise timestamps alongside a hash of the full submission, but
it doesn't actually store the full submission itself anymore, and the
first few tables of metrics I've checked do not any contain timestamps.
But we do need to audit and make sure that if timestamps are stored
anywhere else, we must reduce their granularity to prevent them from
being matched up with timestamps from other records. It's probably more
than sufficient to know that a metric was submitted on a given day, for
example; there's just no need to know that a record was submitted at
any given second. Anyway, that's an easy problem.
Then there are two other problems I can think of:
1. You might be able to guess that records are from the same user based
on the order of the rows in the database. I'm not sure what will be the
final solution for this. Randomizing the position of new rows would
surely avoid this problem, but could possibly have performance impact
at scale? I'm not sure. We'll need to do something about this to keep
our promise that it should not be possible to correlate records.
2. Another problem is that malformed records are kept in their entirety
so the problem can be investigated. A human looking at a malformed
record would see the aggregated data for a particular user. This should
theoretically only happen in the event of a bug, but bugs happen. ;) I
could also hypothetically imagine a system's hardware being so broken
as to corrupt metrics, yet still somehow manage to boot, for instance.
What to do about this is an open question. The safest option would be
to discard rather than store malformed records, at the cost of being
unable to investigate and fix this class of bugs.
Michael
--
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue