On 11/11/2022 6:31 PM, Elijah Newren wrote: > On Mon, Nov 7, 2022 at 10:48 AM Derrick Stolee via GitGitGadget > <gitgitgadget@xxxxxxxxx> wrote: >> >> From: Derrick Stolee <derrickstolee@xxxxxxxxxx> >> >> The previous change allowed skipping the hashing portion of the >> hashwrite API, using it instead as a buffered write API. Disabling the >> hashwrite can be particularly helpful when the write operation is in a >> critical path. >> >> One such critical path is the writing of the index. This operation is so >> critical that the sparse index was created specifically to reduce the >> size of the index to make these writes (and reads) faster. >> >> Following a similar approach to one used in the microsoft/git fork [1], >> add a new config option that allows disabling this hashing during the >> index write. The cost is that we can no longer validate the contents for >> corruption-at-rest using the trailing hash. >> >> [1] https://github.com/microsoft/git/commit/21fed2d91410f45d85279467f21d717a2db45201 >> >> While older Git versions will not recognize the null hash as a special >> case, the file format itself is still being met in terms of its >> structure. Using this null hash will still allow Git operations to >> function across older versions. >> >> The one exception is 'git fsck' which checks the hash of the index file. >> Here, we disable this check if the trailing hash is all zeroes. We add a >> warning to the config option that this may cause undesirable behavior >> with older Git versions. >> >> As a quick comparison, I tested 'git update-index --force-write' with >> and without index.computHash=false on a copy of the Linux kernel >> repository. >> >> Benchmark 1: with hash >> Time (mean ± σ): 46.3 ms ± 13.8 ms [User: 34.3 ms, System: 11.9 ms] >> Range (min … max): 34.3 ms … 79.1 ms 82 runs >> >> Benchmark 2: without hash >> Time (mean ± σ): 26.0 ms ± 7.9 ms [User: 11.8 ms, System: 14.2 ms] >> Range (min … max): 16.3 ms … 42.0 ms 69 runs >> >> Summary >> 'without hash' ran >> 1.78 ± 0.76 times faster than 'with hash' >> >> These performance benefits are substantial enough to allow users the >> ability to opt-in to this feature, even with the potential confusion >> with older 'git fsck' versions. > > This is impressive and interesting...but an improvement unrelated to > this series other than the fact that it builds on some of it. Perhaps > pull this patch out? While patch 1 is required for the packed-refs work, this one is an easy way to take advantage of it. I'll submit these two patches soon on their own as the rest of the RFC is discussed. > Also, would it make sense to integrate index.computeHash with feature.manyFiles? It would make sense to include in feature.manyFiles and Scalar's recommended config. I expect that it would be good to have the config available in a Git release before updating those configs to include it. Perhaps that is too conservative, though. Thanks, -Stolee