Re: [ANNOUNCE] Git v2.23.0-rc0 - Initial test failures on NonStop

Jeff King <peff@xxxxxxxx> · Tue, 30 Jul 2019 20:59:34 -0400

On Tue, Jul 30, 2019 at 10:56:24PM +0200, SZEDER Gábor wrote:

> > Ah, of course. Our oid hashing is done by just picking off the first
> > bytes of the sha1, and it doesn't care about endianness (because these
> > are just internal-to-memory hashes).
> 
> Yeah.
> 
> > We _could_ reconcile that like this:
> 
> Do we really want that, though?  It's a hashmap, after all, and the
> order of iteration over various hashmap implementations tends to be
> arbitrary.  So an argument could be made that this test is overly
> specific by expecting a particular order of elements (and perhaps by
> checking the elements' oid as well), and it would be sufficient to
> check that it iterates over all elements, no matter the order (IOW
> sorting 'actual' before the comparison).

I'd agree that this test is being overly specific. I guess what I'm
feeling is a vague notion that it might be better if Git behaves
deterministically regardless of endian-ness. Not because it _should_
matter for this test, but there could literally be a bug on big-endian
platforms that nobody knows about because it's very rare for anybody to
test there.

I admit that's pretty hand-wavy though. And there may actually be a
benefit in finding such a bug, because it means that some part of the
code (or a test) is relying on something it ought not to.

> OTOH, this is not just any hashmap, but an oidmap, and I could imagine
> that there might be use cases where it would be beneficial if the
> iteration order were to match the oid order (but don't know whether we
> actually have such a use case).

I don't think we can promise anything about iteration order. This test
is relying on the order at least being deterministic between runs, but
if we added a new entry and had to grow the table, all bets are off.

So regardless of the endian thing above, it probably does make sense for
any hashmap iteration output to be sorted before comparing. That goes
for t0011, too; it doesn't have this endian thing, but it looks to be
relying on hash order that could change if we swapped out hash
functions.

> > I
> > wonder if it's appreciably less efficient. I'll bet I could nerd-snipe
> > René into doing a bunch of measurements and explorations of the
> > disassembled code. ;)
> 
> Maybe it shows up in an oidmap-specific performance test, but with all
> that's usually going on in Git hashmap performance tends to be
> negligible (e.g. it's rarely visible in flame graphs).

That's my guess, too, but data trumps guesses (you'll note that I'm not
volunteering to _collect_ the data, though, which perhaps gives a sense
of how invested I am in it. ;) ).

-Peff