Re: Git status extremely slow if any file is a multiple of 8GBi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 04.05.22 um 19:47 schrieb Jason Hatton:
>>> The condition sd_size==0 is used as a signal for "no, we really need
>>> to compare the contents", and causes the contents to be hashed, and
>>> if the contents match the object name recorded in the index, the
>>> on-disk size is stored in sd_size and the entry is marked as
>>> CE_UPTODATE.  Alas, if the truncated st_size is 0, the resulting
>>> entry would have sd_size==0 again, so a workaround like what you
>>> outlined is needed.
>>
>> Junio C Hamano <gitster@xxxxxxxxx> writes:
>>
>> This is of secondary importance, but the fact that Jason observed
>> 8GBi files gets hashed over and over unnecessarily means that we
>> would do the same for an empty file, opening, reading 0-bytes,
>> hashing, and closing, without taking advantage of the fact that
>> CE_UPTODATE bit says the file contents should be up-to-date with
>> respect to the cached object name, doesn't it?
>>
>> Or do we have "if st_size == 0 and sd_size == 0 then we know what it
>> hashes to (i.e. EMPTY_BLOB_SHA*) and there is no need to do the
>> usual open-read-hash-close dance" logic (I didn't check)?
>
> Junio C Hamano
>
> As best as I can tell, it rechecks the zero sized files. My Linux box can run
> git ls in .006 seconds with 1000 zero sized files in the repo. Rehashing every
> file that is a multiple of 2^32 with every "git ls" on the other hand...
>
> I managed to actually compile git with the proposed changes.

Meaning that file sizes of n * 2^32 bytes get recorded as 1 byte instead
of 0 bytes?  Why 1 and not e.g. 2^32-1 or 2^31 (or 42)?

> It seems to correct
> the problem and "make test" passes. If upgrading to the patched version if git,
> git will rehash the 8GBi files once and work normally. If downgrading to an
> unpatched version, git will perceive that the 8GBi files have changes. This
> needs to be corrected with "git add" or "git checkout".

Not nice, but safe.  Can there be an unsafe scenario as well?  Like if a
4GiB file gets added to the index by the new version, which records a
size of 1, then the file is extended by one byte while mtime stays the
same and then an old git won't detect the change?

> If you people are
> interested, I may be able to find a way to send a patch to the list or put it
> on github.

Patches are always welcome, they make discussions and testing easier.

René




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux