On Wed, Aug 27, 2014 at 02:00:16PM -0400, Jeff King wrote: > That may be something some callers want, but they should build it > separately around this find_commit_header, so that callers that want a > single line (like "encoding" or "author") do not have to pay the price. > I didn't bother building it out here since there are no callers which > want it yet (though I did not look at the mergetag code, which could > possibly be converted). I just peeked at the mergetag code. It is all built around read_commit_extra_headers, which is a different approach (it is "copy out non-standard things", not "find this one thing I am looking for"). The former is more efficient if we are looking for lots of things, since we'd only have to parse once. But we don't use it that way (we parse the whole thing and then see if we have any "mergetag" headers). The most efficient and convenient thing IMHO would be a progressive parser that keeps a partially-parsed state and advances the parser on-demand. So if I ask it for header "foo", it would start at the beginning and parse until it finds "foo", marking the location of anything along the way. If I then ask for "bar", it would keep going from the end of "bar", and so forth. I do not know if that is even worth the effort, though. I do not think commit-parsing is a major hotspot for most operations (it might be for a traversal, but we already use a minimalistic parser there that only grabs the items that "struct commit" cares about). And we already zlib-inflate the whole commit object in the first place, so it's not like we haven't touched all these bytes anyway[1]. -Peff [1] A long time ago I experimented with having parse_commit do a partial inflate, just up to the empty-line delimiter. I don't have the numbers handy, but I recall that it did not make a measurable improvement in rev-list speeds. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html