Re: [PATCH 0/1] Hashed mailmap support

Jeff King <peff@xxxxxxxx> · Mon, 14 Dec 2020 21:40:19 -0500

On Mon, Dec 14, 2020 at 08:48:14PM -0500, Jeff King wrote:

> On Sun, Dec 13, 2020 at 01:05:38AM +0000, brian m. carlson wrote:
> 
> > Note that this is not perfect, because a user can simply look up all the
> > hashed values and find out the old values.  However, for projects which
> > wish to adopt the feature, it can be somewhat effective to hash all
> > existing mailmap entries and include some no-op entries from other
> > contributors as well, so as to make this process less convenient.
> 
> I remain unconvinced of the value of any noop entries. Ultimately it's
> easy to invert a one-way hash that comes from a small known set of
> inputs. And that's true whether there are extra noops or not.
> 
> The interesting argument IMHO is that somebody has to _bother_ to invert
> the hash. So it means that the old and new identities do not show up
> next to each other in a file indexed by search engines, etc. That drops
> the low-hanging fruit.
> 
> And from that argument, I think the obvious question becomes: is it
> worth using a real one-way function, as opposed to just obscuring the
> raw bytes (which Ævar went into in more detail). I don't have a strong
> opinion either way (the obvious one in favor is that it's less expensive
> to do so; and something like "git log" will have to either compute a lot
> of these hashes, or cache the hash computations internally).
> 
> I think somebody also mentioned that there's value in the social
> signaling here, and I agree with that. But that is true even for a
> reversible encoding, I think.

After re-reading what I wrote, I just wanted to make clear: overall the
feature makes sense to me. I am questioning only the argument for it,
and whether a one-way hash is the right tradeoff there.

-Peff