Re: What's cooking in git.git (Jan 2021, #02; Fri, 8)

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 11 Jan 2021 11:04:11 -0800

"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:

> On 2021-01-09 at 23:20:25, Junio C Hamano wrote:
>> "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:
>> 
>> > On 2021-01-09 at 21:28:58, Junio C Hamano wrote:
>> >> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:
>> >> > FWIW there was since a re-roll on 2021-01-03, but the discussion is
>> >> > sort-of outstanding, so maybe that's intentional...
>> >> 
>> >> I had an impression that those 4 or 5 patches haven't gained
>> >> concensus that they are good as-is.
>> >
>> > There will be another reroll.  I'm hoping to get to it this weekend.
>> 
>> Thanks.
>
> Having read Ævar's latest comment, I've decided instead to drop this, so
> feel free to do so whenever it's convenient.

That's kind of sad.

I view that this is the kind of topic where perfect easily can
become an enemy of good, as there is by definition no perfection
available to us without breaking existing Git.

I do not know about Ævar, but to me, my initial impression while
reading the discussion from sideline was that the goal was to
prevent a mechanical scan of a recent version of .mailmap from
learning that Joe used to use Jane as his/her name, and that was the
reason why I asked to be convinced why encoding for obfuscation was
insufficient.  In the above, I meant "mechanical scan" as something
like "a web search engine crawls and finds a .mailmap---a query for
Joe produces a line with some garbage on it that is not Jane." and a
casual attacker would stop there.

But of course, a casual attacker who knows urlencode or whatever
obfuscation in use can read that "garbage" once he/she knows that
"garbage" is worth attacking (i.e. it is known to be associated to
Joe, the person the attacker is interested in).

If your goal is to make it harder than just urlencode, even though
we all have to accept that scanning "git log --all" for all names
that appear in the history and hashing them all to see what name
hashes to the "garbage" in question, then @sha256:<hash> approach
does make sense as a stopping point.  Perhaps we need to sell this
with a clear definition of what kind of attackes we are protecting
the name data from:

    The attacker is required to obtain sufficient amount of history
    in the project to uncover the obfuscation; a more casual
    attackers will fail to uncover, and we declare that it is better
    than nothing and it is good enough in practice.

or something like that?  I am not sure if I drew the line at the
level you intended to draw in the above, if I think that it is good
enough in practice, or if I agree to a change that is better than
nothing but not good enough in practice, but having such a statement
would help to see where we agree or disagree.

Thanks.