> > The condition sd_size==0 is used as a signal for "no, we really need > > to compare the contents", and causes the contents to be hashed, and > > if the contents match the object name recorded in the index, the > > on-disk size is stored in sd_size and the entry is marked as > > CE_UPTODATE. Alas, if the truncated st_size is 0, the resulting > > entry would have sd_size==0 again, so a workaround like what you > > outlined is needed. > > Junio C Hamano <gitster@xxxxxxxxx> writes: > > This is of secondary importance, but the fact that Jason observed > 8GBi files gets hashed over and over unnecessarily means that we > would do the same for an empty file, opening, reading 0-bytes, > hashing, and closing, without taking advantage of the fact that > CE_UPTODATE bit says the file contents should be up-to-date with > respect to the cached object name, doesn't it? > > Or do we have "if st_size == 0 and sd_size == 0 then we know what it > hashes to (i.e. EMPTY_BLOB_SHA*) and there is no need to do the > usual open-read-hash-close dance" logic (I didn't check)? Junio C Hamano As best as I can tell, it rechecks the zero sized files. My Linux box can run git ls in .006 seconds with 1000 zero sized files in the repo. Rehashing every file that is a multiple of 2^32 with every "git ls" on the other hand... I managed to actually compile git with the proposed changes. It seems to correct the problem and "make test" passes. If upgrading to the patched version if git, git will rehash the 8GBi files once and work normally. If downgrading to an unpatched version, git will perceive that the 8GBi files have changes. This needs to be corrected with "git add" or "git checkout". If you people are interested, I may be able to find a way to send a patch to the list or put it on github. Thanks Jason D. Hatton