Re: [PATCH 1/3] parse_object(): allow skipping hash check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 07, 2022 at 10:15:37AM -0400, Derrick Stolee wrote:

> A quick search shows many uses of parse_object() across the codebase.
> It would certainly be nice if they all suddenly got faster by avoiding
> this hashing, but I also suppose that most of the calls are using
> parse_object() only because they are unsure if they are parsing a
> commit or a tag and would never parse a large blob.

Yeah. I think we use parse_object() as a catch-all for "somebody gave us
an oid, and we need to know what it is". I suspect that most normal uses
would not get much faster, because we typically do feed it non-blob
objects that are small, and whose contents we need to access anyway to
parse them. So we're paying only the overhead of sha1 on a buffer we
already have in memory.

In cases where we might not need the parsed contents at all, our best
bet is to actually remove or delay the parsing entirely. E.g., I think
upload-pack used to be aggressive about parsing the ref tips that it
advertised, but really we can just tell the client about them and only
parse the ones they ask for.

> I think this approach of making parse_object_with_flags() is the best
> way to incrementally approach things here. If we decide that we need
> the _with_flags() version specifically to avoid this hash check, then
> we could probably take the second approach: remove the hash check from
> parse_object() and swap the places that care to use read_object_file()
> instead. My guess is that in the long term there will be fewer swaps
> to read_object_file() than to parse_object_with_flags().
> 
> However, this is a good first step to make progress without doing the
> time-consuming audit of every caller to parse_object().

The notion that this hash check in parse_object() might be slow is
certainly not new. I've been thinking about it for at least a decade. ;)
But until this recent case of direct-fetching blobs, I hadn't seen an
instance where it really made a significant and measurable difference.

So I'm definitely not opposed to going to a world where we drop the
extra hash checks entirely, if that buys us something. The
incrementalism is conservative, but it also makes it easy to
convert specific call-sites to measure the outcomes.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux