Re: Dealing with corporate email recycling

Philip Oakley <philipoakley@iee.email> · Mon, 14 Mar 2022 11:56:17 +0000

On 13/03/2022 23:16, Junio C Hamano wrote:
> Sean Allred <allred.sean@xxxxxxxxx> writes:
>
>> rather than use magic comments :-) Adapting to your suggestion, this
>> might look like the following:
>>
>>     A. U. Thor <foo@xxxxxxxxxxx> <ada.example.com> <[ approxidate ]>
> You'd probably want a timerange (valid-from and valid-to), instead
> of one single timestamp?
I'm not so sure that the date range approach won't bring it's own
problems. What happens outside the date range? i.e. Do we then have
three identities: Before, During, and After, with only 'During' being
defined?

I more see a single date being used as a termination point for an
existing email sequence that defines a retrospective end point for the
mapping of the old email addresses to a single person. Future emails for
the same mailbox will be for a different 'current' person. This would
match the single linked list commit history view using the chronology
heuristic.

The key here being to have a final identity system in place so that you
can uniquely identify the old John Doe, from the newer John Doe`s at the
relevant time point in the mailmap.

>
> Because at least three valid forms of mailmap entries should be
> understood by the current generation of mailmap readers, i.e.
>
>     Human Readable Name <e-mail@xxxxxxxxx>
>     Right Name <right@xxxxxxxxx> <wrong@xxxxxxxxx>
>     Right Name <right@xxxxxxxxx> Wrong Name <wrong@xxxxxxxxx>
>
> the extended entry format to record the validity timerange should
> be chosen to cause parsers that are prepared to take these three
> kinds of lines to barf and ignore.
The presence of a _sequence_ of name/email changes isn't well defined.
As I remember it we take the name/email updates in sequence and then
apply a last one wins approach. It's not clear what would be done when
we have two, or three different John Doe sequences all mixed in.

A broader issue for the corporate email mailbox systems is those that
are allocated to roles. So you may have Traning1@xxxxxxxx thru
Training9@xxxxxxxx (we had) and if that training includes practical low
hanging fruit examples from a project, it's difficult to disambiguate
those commits. More likely is say, having TestPC1 - TestPC9 that
included debug commits, perhaps even with pair programming test & debug
sessions, so allocation to individuals (rather than mailbox) becomes a
real problem. Hopefully that's rare in Sean's case.

Philip