On Wed, Sep 07, 2022 at 10:15:37AM -0400, Derrick Stolee wrote: > A quick search shows many uses of parse_object() across the codebase. > It would certainly be nice if they all suddenly got faster by avoiding > this hashing, but I also suppose that most of the calls are using > parse_object() only because they are unsure if they are parsing a > commit or a tag and would never parse a large blob. Yeah. I think we use parse_object() as a catch-all for "somebody gave us an oid, and we need to know what it is". I suspect that most normal uses would not get much faster, because we typically do feed it non-blob objects that are small, and whose contents we need to access anyway to parse them. So we're paying only the overhead of sha1 on a buffer we already have in memory. In cases where we might not need the parsed contents at all, our best bet is to actually remove or delay the parsing entirely. E.g., I think upload-pack used to be aggressive about parsing the ref tips that it advertised, but really we can just tell the client about them and only parse the ones they ask for. > I think this approach of making parse_object_with_flags() is the best > way to incrementally approach things here. If we decide that we need > the _with_flags() version specifically to avoid this hash check, then > we could probably take the second approach: remove the hash check from > parse_object() and swap the places that care to use read_object_file() > instead. My guess is that in the long term there will be fewer swaps > to read_object_file() than to parse_object_with_flags(). > > However, this is a good first step to make progress without doing the > time-consuming audit of every caller to parse_object(). The notion that this hash check in parse_object() might be slow is certainly not new. I've been thinking about it for at least a decade. ;) But until this recent case of direct-fetching blobs, I hadn't seen an instance where it really made a significant and measurable difference. So I'm definitely not opposed to going to a world where we drop the extra hash checks entirely, if that buys us something. The incrementalism is conservative, but it also makes it easy to convert specific call-sites to measure the outcomes. -Peff