Git status extremely slow if any file is a multiple of 8GBi

Jason Hatton <jhatton@xxxxxxxxxxxxxxxxxxx> · Wed, 4 May 2022 00:15:34 +0000

Git is extremely slow if any file is a multiple of 8GBi. Below is
a working example.

> git init
> dd if=/dev/zero of=8gb bs=1024 count=$((1024*1024*8))
> git lfs track 8gb
> git add .
> git commit
> time git status
> time git status
> time git status

This seems related to git using uint32_t for storing the file size in the index.
A file of zero size has a special meaning.

I have an open issue on git-for-windows where @dscho has a lot of good technical
information.

https://github.com/git-for-windows/git/issues/3833#issuecomment-1116544918

I have a proposed idea that may or may not help. Would it be possible for any
file that is a multiple of 2^32 to be adjusted to a file size of 1 instead of
zero? Git is already functioning with mangling file sizes over 4GB in the
index, so maybe bumping up the size of 2^32 multiple files would mitigate the
issue. Using different versions of git would just cause 2^32 multiple files to
be re-checked again, which is the current behavior anyway. 0 would retain its
special meaning. If the file size is changed from 2^32 to 2^32 + 1, git may not
notice that the file changed because both files will look like they are size 1.
The ctime and mtime may still catch this.

// Some pseudo code near: 
https://github.com/git-for-windows/git/blob/dc88e3cd72a2f0bbe2fe513acfc72bd66b577851/read-cache.c#L176sd-

sd->sd_size = st->st_size;
if (sd->sd_size == 0 && st->st_size != 0) {
    sd->sd_size = 1;
}

Thanks
--
Jason D. Hatton