Am 04.05.22 um 19:47 schrieb Jason Hatton: >>> The condition sd_size==0 is used as a signal for "no, we really need >>> to compare the contents", and causes the contents to be hashed, and >>> if the contents match the object name recorded in the index, the >>> on-disk size is stored in sd_size and the entry is marked as >>> CE_UPTODATE. Alas, if the truncated st_size is 0, the resulting >>> entry would have sd_size==0 again, so a workaround like what you >>> outlined is needed. >> >> Junio C Hamano <gitster@xxxxxxxxx> writes: >> >> This is of secondary importance, but the fact that Jason observed >> 8GBi files gets hashed over and over unnecessarily means that we >> would do the same for an empty file, opening, reading 0-bytes, >> hashing, and closing, without taking advantage of the fact that >> CE_UPTODATE bit says the file contents should be up-to-date with >> respect to the cached object name, doesn't it? >> >> Or do we have "if st_size == 0 and sd_size == 0 then we know what it >> hashes to (i.e. EMPTY_BLOB_SHA*) and there is no need to do the >> usual open-read-hash-close dance" logic (I didn't check)? > > Junio C Hamano > > As best as I can tell, it rechecks the zero sized files. My Linux box can run > git ls in .006 seconds with 1000 zero sized files in the repo. Rehashing every > file that is a multiple of 2^32 with every "git ls" on the other hand... > > I managed to actually compile git with the proposed changes. Meaning that file sizes of n * 2^32 bytes get recorded as 1 byte instead of 0 bytes? Why 1 and not e.g. 2^32-1 or 2^31 (or 42)? > It seems to correct > the problem and "make test" passes. If upgrading to the patched version if git, > git will rehash the 8GBi files once and work normally. If downgrading to an > unpatched version, git will perceive that the 8GBi files have changes. This > needs to be corrected with "git add" or "git checkout". Not nice, but safe. Can there be an unsafe scenario as well? Like if a 4GiB file gets added to the index by the new version, which records a size of 1, then the file is extended by one byte while mtime stays the same and then an old git won't detect the change? > If you people are > interested, I may be able to find a way to send a patch to the list or put it > on github. Patches are always welcome, they make discussions and testing easier. René