On 18 April 2012 10:27, Ricky Zhou <ricky@xxxxxxxxxxxxxxxxx> wrote: > On 2012-04-18 09:56:44 AM, Kevin Fenzi wrote: >> http://stackoverflow.com/questions/4552566/logging-ip-address-for-uniqueness-without-storing-the-ip-address-itself-for-priv >> >> has some ideas, but no great clear answer. >> >> http://bug.st/mod_anonstats seems to use md5. >> >> I'm assuming the consumer of these logs will process them after they >> are hashed? In which case we do need to make sure the same ip hashes to >> the same hash ? Or could we process them first, then hash the ip before >> making the data public? > I think something like an HMAC is the correct way to hide IPs. > Unfortunately, there is still information other than IP address that can > potentially leak some privacy information, such as: > * rare/unique user agent strings > * URLs that can be be linked to the person who's visiting them (a lot > of mailman links contain emails, for example) > * potentially still-valid CSRF tokens Plus with well known ips going to show up in any log.. the salting mechanism is going to be not much use. [If you know that your ip address was 72.124.10.4 and you went looking for stuff at this time, you can figure out the salt by running the hash as the unknown, your ip address as the known, and a script to find the salt. Unless the salt is over 20 characters you will figure it out within a month. Having changing salts doesn't work as well because you will have to be able to track some things over time.] Add in the fact that there are multiple other factors which non-anonymize a person in a log file.. (or multiple log files) and I don't see how it is reasonably possible to expect any strong anonymization to occur [strong being defined that it would take more than a month to determine who did what when.] > I think a lot more thought and user notification should happen before we > can consider making logs public. Alternatively, what do you think about > a system where somebody who wanted to run statistics either gets access > to the logs, or gives us a script that we'll verify and then run in a > cronjob. I don't think we'll get enough requests to the point where > doing things manually like this becomes a burden. > > Maybe we can also take a look at how organizations like wikipedia handle > these sorts of things. > > Thanks, > Ricky > > _______________________________________________ > infrastructure mailing list > infrastructure@xxxxxxxxxxxxxxxxxxxxxxx > https://admin.fedoraproject.org/mailman/listinfo/infrastructure -- Stephen J Smoogen. "The core skill of innovators is error recovery, not failure avoidance." Randy Nelson, President of Pixar University. "Years ago my mother used to say to me,... Elwood, you must be oh so smart or oh so pleasant. Well, for years I was smart. I recommend pleasant. You may quote me." —James Stewart as Elwood P. Dowd _______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure