Re: [PATCH 2/4] dir.c::match_basename(): pay attention to the length of string parameters

Jeff King <peff@xxxxxxxx> · Tue, 26 Mar 2013 17:29:48 -0400

On Tue, Mar 26, 2013 at 01:49:10PM -0700, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> > I timed this doing "git archive HEAD" on webkit.git before and after. It
> > actually ended up not mattering much (I think because it is only the
> > directories which are affected, not each individually path, so it's a
> > much smaller number than you'd think). The best-of-five timing was
> > slightly slower, but was within the noise.
> 
> Interesting.  Because "archive" has to incur a large I/O cost
> anyway, I expected extra allocation for correctness for only the
> directory paths would be dwarfed in the noise.
> 
> I actually care more about cases other than "archive", though.  Do
> we even feed directory paths to the machinery?

In general, no, I don't think so. That's why I tested "archive", since I
knew it did. In the normal case, we should just feed file paths, meaning
we only run into this code path when somebody has "foo/" in their
pattern. Testing like:

  git ls-files -z >files
  time git check-attr --stdin -z -a <files >/dev/null

showed a difference well within the noise.

> > So I do still think it would make sense to go to a byte-limited version
> > of fnmatch eventually, just for code cleanliness and predictability of
> > performance, but this is really not a bad solution in the interim.
> 
> Yes, what we do with wildmatch is a separate issue for 'master' and
> upwards.

Oh, agreed. I just wanted to see how much performance would be impacted
for the interim. But it seems that it's not.

So I think your series is the right direction, but we would want to
factor out the allocation code and use it from match_pathname, as well.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html