Re: [PATCH v2 1/6] commit: provide a function to find a header in a buffer

Jeff King <peff@xxxxxxxx> · Wed, 27 Aug 2014 14:16:06 -0400

On Wed, Aug 27, 2014 at 02:00:16PM -0400, Jeff King wrote:

> That may be something some callers want, but they should build it
> separately around this find_commit_header, so that callers that want a
> single line (like "encoding" or "author") do not have to pay the price.
> I didn't bother building it out here since there are no callers which
> want it yet (though I did not look at the mergetag code, which could
> possibly be converted).

I just peeked at the mergetag code. It is all built around
read_commit_extra_headers, which is a different approach (it is "copy
out non-standard things", not "find this one thing I am looking for").

The former is more efficient if we are looking for lots of things, since
we'd only have to parse once. But we don't use it that way (we parse the
whole thing and then see if we have any "mergetag" headers).

The most efficient and convenient thing IMHO would be a progressive
parser that keeps a partially-parsed state and advances the parser
on-demand. So if I ask it for header "foo", it would start at the
beginning and parse until it finds "foo", marking the location of
anything along the way. If I then ask for "bar", it would keep going
from the end of "bar", and so forth.

I do not know if that is even worth the effort, though. I do not think
commit-parsing is a major hotspot for most operations (it might be for a
traversal, but we already use a minimalistic parser there that only
grabs the items that "struct commit" cares about). And we already
zlib-inflate the whole commit object in the first place, so it's not
like we haven't touched all these bytes anyway[1].

-Peff

[1] A long time ago I experimented with having parse_commit do a partial
    inflate, just up to the empty-line delimiter. I don't have the
    numbers handy, but I recall that it did not make a measurable
    improvement in rev-list speeds.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html