Re: Dealing with corporate email recycling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13/03/2022 15:21, Sean Allred wrote:
> <rsbecker@xxxxxxxxxxxxx> writes:
>> I have another reluctant suggestion, but it depends on your industry,
>> regulations, and other factors. In some sectors, there is a
>> requirement to keep only some period of time worth of history. In
>> fact, in some settings, keeping user identifying information beyond,
>> say 7 years, actually is problematic. Pruning your history may be not
>> only an option but required. An alternative is to use filter-branch to
>> essentially tokenize the identities of past authors and keep those in
>> a electronic vault somewhere. I have customers who are interpreting
>> GDPR-like rules just such as situation, where employees gone 7 years
>> ago and cannot be retained, by name, in the repos. I am not personally
>> happy about that, because my own repo-OCD demands that I know exactly
>> who did what until the end of time, but according to them, it actually
>> violates the local regulations. I'm sure you have had conversations
>> with lawyers, yes? ☹
> I don't believe we've involved our legal team here (I'll follow up with
> them internally), but that might be a spin-off discussion for folks who
> know they're affected.  It would seem that the design of Git makes
> purging history on an ongoing basis problematic -- you would always have
> at least one unresolvable reference to a parent commit.  If this is a
> real requirement from GDPR-like laws, either 'reasonable' VCS metadata
> needs to be a specific carve-out in those laws -- but who the heck knows
> what is 'reasonable' -- or as a project, Git needs to have an answer to
> this situation and an ability to truncate history without otherwise
> altering it.
>
> It's also worth noting that even in the last five years, at our scale,
> we've definitely run into the email-recycling problem already.
>
> Being based in the U.S. and not having seen pitchforks about this yet,
> I'd like to assume for the purpose of this discussion that we're keeping
> all our history.
>
> I think if the topic of legal implications of keeping history in
> perpetuity is valuable to continue, we should spin it off into a
> separate thread.  Personally I'm not seeing what we (Git) could
> realistically do about it other than provide recommendations and paths
> forward -- which might require considerable development.
>
>
The GDPR isn't as onerous as some suggest, as it isn't a set of black
and white rules, rather in cases like these you need to have a real
strong reason for why data is retained etc, such as being part of the
verification and validation of the commit data. There have been various
discussions around this in many of the technical journals.

It maybe that your internal Git version could disable the particular
`format` option ('%ae'?) for the original name, so only the designated
('redacted') mailmap entry is shown to casual users (assumes the repo is
inside the corporate firewall). This would avoid invalidating the repos
validation capability, while meeting the needs of GDPR type regulations.

In the same vein, a local Git version could, being open source, add
allowances for your extra mailmap entry details, such as adding a post
fix " % <approxidate>" limits for the use of the particular name/email
combo to allow date ranges to emerge.

I noted that all the .mailmap examples in the man page have ">" as the
final character, but I haven't looked to see if the code always requires
that the last element of the entry is an <email> address, or whether it
currently barfs on extra elements.

--
Philip



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux