Re: Git status extremely slow if any file is a multiple of 8GBi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> > The condition sd_size==0 is used as a signal for "no, we really need
> > to compare the contents", and causes the contents to be hashed, and
> > if the contents match the object name recorded in the index, the
> > on-disk size is stored in sd_size and the entry is marked as
> > CE_UPTODATE.  Alas, if the truncated st_size is 0, the resulting
> > entry would have sd_size==0 again, so a workaround like what you
> > outlined is needed.
> 
> Junio C Hamano <gitster@xxxxxxxxx> writes:
>
> This is of secondary importance, but the fact that Jason observed
> 8GBi files gets hashed over and over unnecessarily means that we
> would do the same for an empty file, opening, reading 0-bytes,
> hashing, and closing, without taking advantage of the fact that
> CE_UPTODATE bit says the file contents should be up-to-date with
> respect to the cached object name, doesn't it?
> 
> Or do we have "if st_size == 0 and sd_size == 0 then we know what it
> hashes to (i.e. EMPTY_BLOB_SHA*) and there is no need to do the
> usual open-read-hash-close dance" logic (I didn't check)?

Junio C Hamano

As best as I can tell, it rechecks the zero sized files. My Linux box can run
git ls in .006 seconds with 1000 zero sized files in the repo. Rehashing every
file that is a multiple of 2^32 with every "git ls" on the other hand...

I managed to actually compile git with the proposed changes. It seems to correct
the problem and "make test" passes. If upgrading to the patched version if git,
git will rehash the 8GBi files once and work normally. If downgrading to an
unpatched version, git will perceive that the 8GBi files have changes. This
needs to be corrected with "git add" or "git checkout". If you people are
interested, I may be able to find a way to send a patch to the list or put it
on github.

Thanks
Jason D. Hatton





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux