On Thu, Nov 9, 2017 at 1:11 PM, Nathaniel McCallum <npmccallum@xxxxxxxxxx> wrote: > Turning it into a hash doesn't solve the tracking problem. It only > prevents the attacker from knowing a list of serial numbers. I suspect > keeping hashes of identifying information will likely cause > controversy. What is the nature of the tracking problem? A single entry for a single machine is not tracking to me. Tracking requires at least two points in space-time. What's being stored by the Fedora Project? IP, Geolocation, date and time? Those are the things I associate with tracking more than a serial number or a hash of a serial number. Let's say you don't store serial number or a hash, but you do store model information, date/time, and an IP address. If there's no mechanism to avoid duplicate entries, you've got a bigger tracking problem the less common that particular model is. More models will make the data noisy. But if it's a sufficiently rare model, the duplicate entries can be assumed to be representing just a few distinct machines or even just one machine, and now you can track a person even if you don't have any serial number or hashing. So I think necessarily you need a way to eliminate duplicates from entering the data set. Some way of anonymizing the entry in the Fedora Project's data, but also a way to track duplicates. How about two different data sets stored by the Fedora Project? Dataset 1 contains only the hash of the serial number of the device. If that hash is not present in dataset 1, then sanitized device data is added to dataset 2. If the hash is found in dataset 1, then it's not added to dataset 2. But there is no correlation between dataset 1 and dataset 2? -- Chris Murphy _______________________________________________ kernel mailing list -- kernel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to kernel-leave@xxxxxxxxxxxxxxxxxxxxxxx