Re: [PATCH] Prevent git from rehashing 4GBi files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 07.05.22 um 14:33 schrieb Philip Oakley:
>
>
> On 7 May 2022 03:15:00 BST, Jason Hatton <jhatton@xxxxxxxxxxxxxxxxxxx> wrote:
>
>         Philip Oakley <philipoakley@iee.email> writes:
>
>                 This may treat non-zero multiple of 4GiB as "not racy", but has
>                 anybody double checked the concern Réne brought up earlier that a
>                 4GiB file that was added and then got rewritten to 2GiB within the
>                 same second would suddenly start getting treated as not racy?
>
>             This is the pre-existing problem, that ~1in 2^31 size changes might not
>             get noticed for size change. The 0 byte / 4GiB change is an identical
>             issue, as is changing from 3 bytes to 4GiB+3 bytes, etc., so that's no
>             worse than before (well maybe twice as 'unlikely').
>
>
>         OK, it added one more case to 2^32-1 existing cases, I guess.
>
>                 The patch (the firnal version of it anyway) needs to be accompanied
>                 by a handful of test additions to tickle corner cases like that.
>
>             They'd be protected by the EXPENSIVE prerequisite I would assume.
>
>
>         Oh, absolutely. Thanks for spelling that out.
>
>
>     I have been testing out the patch a bit and have good and (mostly) bad news.
>
>     What works using a munge value of 1.
>
>     $ git add
>     $ git status
>
>     Racy seems to work.
>
>     $ touch .git/index 4GiB # 4GiB is now racy
>     $ git status # Git will rehash the racy file
>     $ git status # Git cached the file. Second status is fast.
>
>     What doesn't work.
>
>     $ git checkout 4GiB
>     $ fatal: packed object is corrupt!
>
>     Using a munge value of 1<<31 causes even more problems. The file hash in the
>     index for 4GiB files (git ls-files -s --debug) are set to the zero file hash.
>
>     I looked up and down the code base and couldn't figure out how the munged
>     value was leaking out of read-cache.c and breaking things. Most of the code
>     I found tends to use stat and then convert that to a size_t, not using the
>     munged unsigned int at all.
>
>     Maybe someone else will have better luck. This seems over my head :(
>
>     Thanks
>     --
>     Jason
>
>
> Is this on Git for Windows or a 64 bit Linux?
> There are still some issues on GfW for 2GiB+ files (long Vs long long int).

Which would explain the zero file hash.  And make the platform unfit for
handling big files at all at this time.

FWIW, on MacOS I get this with the patch applied:

   $ git init --quiet /tmp/a
   $ cd /tmp/a
   $ : >size-0
   $ dd if=/dev/zero bs=1 oseek=4294967295 count=1 of=size-4294967296
   1+0 records in
   1+0 records out
   1 bytes transferred in 0.000365 secs (2740 bytes/sec)
   $ dd if=/dev/zero bs=1 oseek=4294967296 count=1 of=size-4294967297
   1+0 records in
   1+0 records out
   1 bytes transferred in 0.000293 secs (3413 bytes/sec)
   $ dd if=/dev/zero bs=1 oseek=6442450943 count=1 of=size-6442450944
   1+0 records in
   1+0 records out
   1 bytes transferred in 0.000266 secs (3759 bytes/sec)
   $ git add size-*
   $ git commit -m initial
   [master (root-commit) d9c2a0a] initial
    4 files changed, 0 insertions(+), 0 deletions(-)
    create mode 100644 size-0
    create mode 100644 size-4294967296
    create mode 100644 size-4294967297
    create mode 100644 size-6442450944

   $ time git checkout size-*
   Updated 0 paths from the index
   git checkout size-*  0.01s user 0.01s system 65% cpu 0.020 total

   $ git ls-files -s --debug | grep size
   100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	size-0
     size: 0	flags: 0
   100644 451971a31ea5a207a10b391df2d5949910133565 0	size-4294967296
     size: 2147483648	flags: 0
   100644 3eb7feb1413c757f0d8181deb28d1dab03d64846 0	size-4294967297
     size: 1	flags: 0
   100644 741285bddfa7863072c238f34e27144c2501832d 0	size-6442450944
     size: 2147483648	flags: 0

So checkout skips all of the files and their cached sizes have the
expected values.

René




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux