On Tue, Apr 17 2018, Peter Backes wrote: > I'd like to ask whether anyone has best practices for achieving GDPR > compliance for git repos? The GDPR will come into effect in the EU next > month. > > In particular, how do you cope with the "Right to erasure" concerning > entries in the history of your git repos? > > Erasing author names from the history changes the commit hashes. It is > well known that this leads to a lot of problems. So I don't consider > this a workable solution. > > And how do you justify publishing your employee's name/email as part of > a git commit under GDPR rules in the first place? > > github has the following page mentioning the "Right to erasure" but > AFAICS nothing about how it will be implemented > https://about.gitlab.com/gdpr/ > > Here are discussions I found but they do not really provide a solution: > https://law.stackexchange.com/questions/24623/gdpr-git-history > https://news.ycombinator.com/item?id=16509755 [Not a lawyer and all that] I've been loosely following a similar discussion around blockchains and my understanding of the situation is that for a project such as say Linux the GDPR gives you this potential out for that[1]: "the personal data are no longer necessary in relation to the purposes for which they were collected or otherwise processed" I.e. you understand that when you submit a patch to linux.git how it's going to get used, and that it's in a storage system that isn't going to be pruned just because you ask for it. In combination with the "Conditions for consent"[2] this becomes a bit more tricky. I.e. "The data subject shall have the right to withdraw his or her consent at any time". You can make a compelling case that for say submitting your data to the Bitcoin blockhcain the above quote from article 17 overrides it, but can you for other hash-based-on-hash systems like linux.git? Maybe, maybe not. I think nobody really knows at this point. What I do think is for sure is that there's not going to be any one size fits all solution based on the underlying technology. If I start storing my webserver access logs with IP information in a git repo, I don't get to say "sorry git stores stuff this way, I don't want to rebase it". No court's going to buy that, I've just gone out of my way to use technology that circumvents the GDPR for no particularly good reason. This is very different from you say joining a company, committing to its internal git repo, and your name being there in perpetuity, or choosing to submit a patch to linux.git or git.git. I'd think that would be handled the same way as a structural engineering firm being able to record in perpetuity who it was that drew up the design for some bridge. I don't think it's plausible that the GDPR, which is probably mainly going to be about consumer protection, is going to concern itself with that in practice. There's a lot of middle ground in between those two though. E.g. children are specially protected under the GDPR. Is Linus going to say he doesn't want to rebase linux.git after some 14 year old who regrets submitting code doesn't want his name there anymore? Who knows. Depending on such common cases maybe git itself should eventually support some ways to work around the issues. E.g. we could have some mode to always supply a fake name/e-mail, or make the notice implicit_ident_advice() spews out somewhat scarier. 1. https://gdpr-info.eu/art-17-gdpr/ 2. https://gdpr-info.eu/art-7-gdpr/