Re: [PATCH] Put sha1dc on a diet

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 3/2/2017 11:35 AM, Linus Torvalds wrote:
On Thu, Mar 2, 2017 at 6:45 AM, Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:
It would probably make sense to switch the index integrity check away from
SHA-1 because we really only care about detecting bit flips there, and we
have no need for the computational overhead of using a full-blown
cryptographic hash for that purpose.
Which index do you actually see as being a problem, btw? The main file
index (.git/index) or the pack-file indexes?

We definitely don't need the checking version of sha1 for either of
those, but as Jeff already did the math, at least the pack-file index
is almost negligible, because the pack-file operations that update it
end up doing SHA1 over the objects - and the object SHA1 calculations
are much bigger.

And I don't think we even check the pack-file index hashes except on fsck.

Now, if your _file_ index is 300-400MB (and I do think we check the
SHA fingerprint on that even on just reading it - verify_hdr() in
do_read_index()), then that's going to be a somewhat noticeable hit on
every normal "git diff" etc.

Yes, the .git/index is 450MB with ~3.1M entries. verify_hdr() is called each time
we read it into memory.

We have been testing a patch in GfW to run the verification in a separate thread while the main thread parses (and mallocs) the cache_entries. This does help
offset the time.
        https://github.com/git-for-windows/git/pull/978/files

But I'd have expected the stat() calls of all the files listed by that
index to be the _much_ bigger problem in that case. Or do you just
turn those off with assume-unchanged?

Yeah, those stat calls are threaded when preloading, but even so..

Yes, the stat() calls are more significant percentage of the time (and having core.fscache and core.preloadindex help that greatly), but the total time for a command is just that -- the total -- so using the philosophy of "every little bit helps", the faster
routines help us here.

Anyway, the file index SHA1 checking could probably just be disabled
entirely (with a config flag). It's a corruption check that simply
isn't that important. So if that's your main SHA1 issue, that would be
easy to fix.

Yes, in the GVFS effort, we disabled the verification with a config setting and haven't
had any incidents.


Everything else - like pack-file generation etc for a big clone() may
end up using a ton of SHA1 too, but the SHA1 costs all scale with the
other costs that drown them out (ie zlib, network, etc).

I'd love to see a profile if you have one.

                       Linus




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]