On Thu, Jul 19, 2018 at 11:24 AM, Duy Nguyen <pclouds@xxxxxxxxx> wrote: > On Thu, Jul 19, 2018 at 07:31:35PM +0200, Duy Nguyen wrote: >> On Thu, Jul 19, 2018 at 01:23:58PM -0400, Jeff King wrote: >> > On Thu, Jul 19, 2018 at 09:42:00AM -0700, Elijah Newren wrote: >> > >> > > Thanks for the quick turnaround. Unfortunately, I have some bad news. >> > > With this patch, I get the following: >> > > >> > > $ /usr/bin/time -f 'MaxRSS:%M Time:%e' git gc --aggressive >> > > Enumerating objects: 4460703, done. >> > > Counting objects: 100% (4460703/4460703), done. >> > > Delta compression using up to 40 threads. >> > > Compressing objects: 100% (3807140/3807140), done. >> > > Writing objects: 100% (4460703/4460703), done. >> > > Total 4460703 (delta 2831383), reused 1587071 (delta 0) >> > > error: failed to unpack compressed delta at offset 183854150 from >> > > .git/objects/pack/pack-30d4f0b0e5a03dc91a658a0586f4e74cdf4a94d6.pack >> > > fatal: packed object 20ce811e53dabbb8ef9368c108cbbdfa65639c03 (stored >> > > in .git/objects/pack/pack-30d4f0b0e5a03dc91a658a0586f4e74cdf4a94d6.pack) >> > > is corrupt >> > > error: failed to run prune >> > > MaxRSS:40025196 Time:2531.52 >> > >> > Looking at that output, my _guess_ is that we somehow end up with a >> > bogus delta_size value and write out a truncated entry. But I couldn't >> > reproduce the issue with smaller test cases. >> >> Could it be a race condition? > > I'm convinced my code is racy (between two writes). I created a broken > pack once with 32 threads. Elijah please try again with this new > patch. It should fix this (I only tried repack a few times so far but > will continue) > > The race is this > > 1. Thread one sees a large delta size and NULL delta_size[] array, > allocates the new array and in the middle of copying old delta > sizes over. > > 2. Thread two wants to write a new (large) delta size. It sees that > delta_size[] is already allocated, it writes the correct size there > (and truncated one in object_entry->delta_size_) > > 3. Back to thread one, it now copies the truncated value in > delta_size_ from step 2 to delta_size[] array, overwriting the good > value that thread two wrote. > > There is also a potential read/write race where a read from > pack_size[] happens when the array is not ready. But I don't think it > can happen with current try_delta() code. I protect it anyway to be > safe. Looking at the output from Peff's instrumentation elsewhere in this thread, I see a lot of lines like mismatched get: 32889efd307c7be376da9e3d45a78305f14ba73a = (, 28) Does that mean it was reading the array when it wasn't ready? Anyway, with your latest patch (which I'm labelling fix-v4), git gc --aggressive completes, git fsck likes the result, and the new table of stats on this repo becomes: Version Pack (MB) MaxRSS(kB) Time (s) ------- --------- ---------- -------- 2.17.0 5498 43513628 2494.85 2.18.0 10531 40449596 4168.94 fix-v1 5509 42509784 2480.74 fiv-v2 5509 41644104 2468.25 fiv-v4 5500 44400948 2761.74 So, the pack size is back to what is expected. The code takes about 10% longer and requires 2% more memory than git-2.17.0, but the pack size was the main issue. However, it's interesting to also look at the effect on packing linux.git (on the same beefy hardware): Version Pack (MB) MaxRSS(kB) Time (s) ------- --------- ---------- -------- 2.17.0 1279 11382932 632.24 2.18.0 1279 10817568 621.97 fiv-v4 1279 11484168 1193.67 While the pack size is nice and small, the original memory savings added in 2.18.0 are gone and the performance is much worse. :-(