Re: Reviving the hardware census

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Thu, 9 Nov 2017 13:26:47 -0700

On Thu, Nov 9, 2017 at 1:11 PM, Nathaniel McCallum
<npmccallum@xxxxxxxxxx> wrote:
> Turning it into a hash doesn't solve the tracking problem. It only
> prevents the attacker from knowing a list of serial numbers. I suspect
> keeping hashes of identifying information will likely cause
> controversy.

What is the nature of the tracking problem? A single entry for a
single machine is not tracking to me. Tracking requires at least two
points in space-time. What's being stored by the Fedora Project? IP,
Geolocation, date and time? Those are the things I associate with
tracking more than a serial number or a hash of a serial number.

Let's say you don't store serial number or a hash, but you do store
model information, date/time, and an IP address. If there's no
mechanism to avoid duplicate entries, you've got a bigger tracking
problem the less common that particular model is. More models will
make the data noisy. But if it's a sufficiently rare model, the
duplicate entries can be assumed to be representing just a few
distinct machines or even just one machine, and now you can track a
person even if you don't have any serial number or hashing.

So I think necessarily you need a way to eliminate duplicates from
entering the data set. Some way of anonymizing the entry in the Fedora
Project's data, but also a way to track duplicates.

How about two different data sets stored by the Fedora Project?
Dataset 1 contains only the hash of the serial number of the device.
If that hash is not present in dataset 1, then sanitized device data
is added to dataset 2. If the hash is found in dataset 1, then it's
not added to dataset 2. But there is no correlation between dataset 1
and dataset 2?

-- 
Chris Murphy
_______________________________________________
kernel mailing list -- kernel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to kernel-leave@xxxxxxxxxxxxxxxxxxxxxxx