XDL_FAST_HASH can be very slow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I ran across an interesting case that diffs very slowly with modern git.
And it's even public. You can clone:

  git://github.com/outpunk/evil-icons

and try:

  git show fc4efe426d5b4e6aa8d5a4dc14babeada7c5f899

(which is also the tip of master as of this writing).

The interesting file there is a 10MB Illustrator file, "assets/ei.ai".
Git treats it as text, as the early part doesn't have any NULs, but it
is mostly non-human-readable. It has a large number of lines, and some
of the lines themselves are quite large.

On my machine, "git show" takes ~77 seconds using v2.2.1. But if I build
the same version with "make XDL_FAST_HASH=", it completes in about 0.4s.
Both produce the same output.

I'm not really sure what's going on.  A few points of interest:

 - You can replicate this with the very first commit that added
   XDL_FAST_HASH, 6942efc (xdiff: load full words in the inner loop of
   xdl_hash_record, 2012-04-06). So it was always bad on this case, and
   it's not part of any more recent changes.

 - We actually _don't_ spend most of our time in xdl_hash_record, the
   function modified by 6942efc. Instead, it all goes to
   xdl_classify_record, which is looping over the set of hash records.
   It's not clear to me if more or different hash records is part of the
   design of XDL_FAST_HASH, or if this is actually a bug.

I haven't dug much further than that.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]