On Sun, Jun 03 2018, Peter Backes wrote: > Unfortunatly this important topic of GDPR compliance has not seen much > interest. I don't think you can infer that there's not much interest, but maybe people just don't have anything to say about it. There's a lot of discussions about this that I've seen, but what they all have in common is that nobody really knows. Just like nobody really knew what the "cookie law" would be like. So I think all of us are just waiting to see. I took the bite and tried to paraphrase some stuff I've read about it, but as you pointed out in 20180417232504.GA4626@xxxxxxxxxxxxxxxxxxx I incorrectly surmised some stuff, although I very much suspect that *in practice* the GDPR is going to be more about "consumer protection". I.e. regulators / prosecutors are much likely to go after some advertising company than some project using a Git repo. Just like nobody's going after some local computer club's internal-only website because it sets cookies without asking, but they might go after Facebook for doing the same. > [...] > In course of this, anonymization could also be added. My idea would be > as follows: > > Do not hash anything directly to obtain the commit ID. Instead, hash a > list of hashes of [$random_number, $information] pairs. $information > could be an author id, a commit date, a comment, or anything else. Then > store the commit id, the list of hashes, and the list of pairs to form > the commit. > > If someone requests erasure, simply empty the corresponding pair in the > list. All that would be left would be the hash of the pair, which is > completely anonymous (not more useful than a random number) and thus > not covered by the GDPR. The history could still be completely > verified, and when displaying the log, the erased entry could be > displayed as "<<ERASED>>". > > What do you think about this? Since the Author is free-form this sort of thing doesn't need to be part of the git data format. You can just generate a UUID like "5c679eda-b4e5-4f35-b691-8e13862d4f79" and then set user.name to "refval:5c679eda-b4e5-4f35-b691-8e13862d4f79" and user.email to "refval:5c679eda-b4e5-4f35-b691-8e13862d4f79". Then you'd create a ref on the server like refs/refval/5c679eda-b4e5-4f35-b691-8e13862d4f79 containing the real "$user <$email>". If you then wanted to erase that field you'd just delete the ref, and it would be much easier to teach stuff that renders the likes of git-log to lookup these refs than changing the data format. Sites that are paranoid about the GDPR could have a pre-receive hook rejecting any pushes from EU customers unless their commits were in this format. Perhaps some variation of this is where the GDPR v2 will go. It'll be an "obligation to be forgotten", and I won't be allowed to use my own name anymore. Instead I'll have a daily UUID issued from a government API to use on various forms, and the only way for anyone to resolve that will be going through a webservice that'll reject UUID lookups older than N months, caching those requests will be met with the death penalty. We'll all be free at last. Okey, that last paragraph is just trolling, but I think that refval: -> ref convention is something worth considering if things *really* go in this direction.