Re: Wildcards in mailmap to hide transgender people's deadnames

"Florine W. Dekker" <florine@xxxxxxxxxxxx> · Tue, 20 Sep 2022 16:58:35 +0200

On 20/09/2022 12:23, Ævar Arnfjörð Bjarmason wrote:
I'm happy to resurrect my SHA-256 hashed mailmap series if we're
all willing to agree to not implement trivial decoding features.
I'd think you'd want to be really clear about what that forward promise
would entail. E.g. I've sometimes wanted a way for "git log" to report
when it munges commits due to adding notes, re-encoding the data etc. If
someone submits that sort of feature should it always explicitly leave
out mailmap-related rewrites?

And even if it does, who do we think we're really helping in the end,
given the trivial way you could get that with an external "diff" with
the one-liner above?

I think the most important thing here is that the mailmap should not 
allow for even-more-trivial ways to discover old names than currently 
already exist. I've thought more about what you said, Ævar, and now I'm 
wary of a mailmap implementation that would entail having my old and new 
information next to each other, even if encoded (doesn't matter if it's 
URL-encoded or base64-encoded), because I think it's likely some 
external data mining tool will decode the address and place them next to 
each other, so that if you search for the email address in a search 
engine you'll also see the other address. I think a hash encoding will 
prevent these automated miners from doing that, since reversing a hash 
is too much effort for an untargeted attack (right? if you disagree, how 
about a salted hash?).

Either way, I think any mailmap-based solution will allow the old and 
new name to be linked to each other by an adversary, as you showed with 
your neat one-liner. However, I think a (salted?) hash in the mailmap 
will be sufficient for casual obfuscation where harassment is unlikely, 
but the user wants to prevent accidental disclosure or plain linkage.

I also have an alternate proposal which I pitched to some folks at Git
Merge and which I just finished writing up that basically moves personal
names and emails out of commits, replacing them with opaque identifiers,
and using a constantly squashed mailmap commit in a special ref to store
the mapping.  This doesn't address changing identities in existing
commits, which as we've seen are nearly impossible to fix, but it does
address new ones.  I've sent it out at
https://lore.kernel.org/git/20220919145231.48245-1-sandals@xxxxxxxxxxxxxxxxxxxx/.
As I understand the difference in this scenario a hypothetical future
repo's Y commit's authorship would have been opaque in the first place
using this mechanism, and via your "refs/mailmap" you'd have mapped
Y=Bob.

You then make a future X commit, and map X=Alice, and have a .mailmap
entry which mapped Y=X, but that entry would refer to the opaque value.

That certainly changes things in a fundamental way, and goes most or all
of the way to mitigating what I've been pointing out as a flaw in these
proposals.

I'd still be very much on the fence about whether we'd ever want to
recommend that to someone concerned with "harassment" and the like (as
opposed to a milder social preference), as all it would take to get to
that point is someone having a copy of the older "refs/mailmap" to
unmask the previous "Y".

I first want to say that I really like your proposal, Brian! I didn't 
think this subject would get the attention it did, but I'm happy it's 
being picked up the way it is, and to see this lively discussion going 
on between yall!

And Ævar, you're right that having an older copy would allow one to 
discover a mapping from the old to the new name. But this will happen in 
any way we can conceivably implement this because the adversary can 
always keep an old copy of the entire repo, clone the new one, and 
compare the two logs. (You can probably come up with a neat one-liner, 
but that's besides the point ;-).) I think that the most appropriate 
threat model here is to assume that everyone who has accessed the repo 
before the name change will notice the name change and will be able to 
create a mapping. Instead, our goal should be to create a system that 
ensures that people who first access the repo after the name change are 
unable to find the old name at all. I think Brian's proposal achieves 
this. This is analogous to the real world where people who knew me 
before my transition will probably never (completely) forget my old 
name, and it's useless to try to make that happen, but at least I can 
prevent new people I meet from finding out the old name.

- Florine