On 20/09/2022 12:23, Ævar Arnfjörð Bjarmason wrote:
I'm happy to resurrect my SHA-256 hashed mailmap series if we're
all willing to agree to not implement trivial decoding features.
I'd think you'd want to be really clear about what that forward promise
would entail. E.g. I've sometimes wanted a way for "git log" to report
when it munges commits due to adding notes, re-encoding the data etc. If
someone submits that sort of feature should it always explicitly leave
out mailmap-related rewrites?
And even if it does, who do we think we're really helping in the end,
given the trivial way you could get that with an external "diff" with
the one-liner above?
I think the most important thing here is that the mailmap should not
allow for even-more-trivial ways to discover old names than currently
already exist. I've thought more about what you said, Ævar, and now I'm
wary of a mailmap implementation that would entail having my old and new
information next to each other, even if encoded (doesn't matter if it's
URL-encoded or base64-encoded), because I think it's likely some
external data mining tool will decode the address and place them next to
each other, so that if you search for the email address in a search
engine you'll also see the other address. I think a hash encoding will
prevent these automated miners from doing that, since reversing a hash
is too much effort for an untargeted attack (right? if you disagree, how
about a salted hash?).
Either way, I think any mailmap-based solution will allow the old and
new name to be linked to each other by an adversary, as you showed with
your neat one-liner. However, I think a (salted?) hash in the mailmap
will be sufficient for casual obfuscation where harassment is unlikely,
but the user wants to prevent accidental disclosure or plain linkage.
I also have an alternate proposal which I pitched to some folks at Git
Merge and which I just finished writing up that basically moves personal
names and emails out of commits, replacing them with opaque identifiers,
and using a constantly squashed mailmap commit in a special ref to store
the mapping. This doesn't address changing identities in existing
commits, which as we've seen are nearly impossible to fix, but it does
address new ones. I've sent it out at
https://lore.kernel.org/git/20220919145231.48245-1-sandals@xxxxxxxxxxxxxxxxxxxx/.
As I understand the difference in this scenario a hypothetical future
repo's Y commit's authorship would have been opaque in the first place
using this mechanism, and via your "refs/mailmap" you'd have mapped
Y=Bob.
You then make a future X commit, and map X=Alice, and have a .mailmap
entry which mapped Y=X, but that entry would refer to the opaque value.
That certainly changes things in a fundamental way, and goes most or all
of the way to mitigating what I've been pointing out as a flaw in these
proposals.
I'd still be very much on the fence about whether we'd ever want to
recommend that to someone concerned with "harassment" and the like (as
opposed to a milder social preference), as all it would take to get to
that point is someone having a copy of the older "refs/mailmap" to
unmask the previous "Y".
I first want to say that I really like your proposal, Brian! I didn't
think this subject would get the attention it did, but I'm happy it's
being picked up the way it is, and to see this lively discussion going
on between yall!
And Ævar, you're right that having an older copy would allow one to
discover a mapping from the old to the new name. But this will happen in
any way we can conceivably implement this because the adversary can
always keep an old copy of the entire repo, clone the new one, and
compare the two logs. (You can probably come up with a neat one-liner,
but that's besides the point ;-).) I think that the most appropriate
threat model here is to assume that everyone who has accessed the repo
before the name change will notice the name change and will be able to
create a mapping. Instead, our goal should be to create a system that
ensures that people who first access the repo after the name change are
unable to find the old name at all. I think Brian's proposal achieves
this. This is analogous to the real world where people who knew me
before my transition will probably never (completely) forget my old
name, and it's useless to try to make that happen, but at least I can
prevent new people I meet from finding out the old name.
- Florine