Re: regression in multi-threaded git-pack-index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King <peff@xxxxxxxx> writes:

> On Fri, Mar 15, 2013 at 03:42:40PM -0700, Stefan Zager wrote:
>
>> We have uncovered a regression in this commit:
>> 
>> b8a2486f1524947f232f657e9f2ebf44e3e7a243
>> 
>> The symptom is that 'git fetch' dies with:
>> 
>> error: index-pack died of signal 10
>> fatal: index-pack failed
>> 
>> I have only been able to reproduce it on a Mac thus far; will try
>> ubuntu next.  We can make it go away by running:
>> 
>> git config pack.threads 1
>
> I couldn't reproduce the problem on Linux with the instructions you
> gave. I did try running it under valgrind and it produced:
>
>   ==2380== Conditional jump or move depends on uninitialised value(s)
>   ==2380==    at 0x441631: resolve_delta (index-pack.c:837)
>   ==2380==    by 0x4419D6: find_unresolved_deltas_1 (index-pack.c:898)
>   ==2380==    by 0x441A45: find_unresolved_deltas (index-pack.c:914)
>   ==2380==    by 0x4427CA: fix_unresolved_deltas (index-pack.c:1232)
>   ==2380==    by 0x4421F5: conclude_pack (index-pack.c:1111)
>   ==2380==    by 0x443A5C: cmd_index_pack (index-pack.c:1604)
>   ==2380==    by 0x4058A2: run_builtin (git.c:281)
>   ==2380==    by 0x405A35: handle_internal_command (git.c:443)
>   ==2380==    by 0x405C01: main (git.c:532)
>
> but the line in question is:
>
>   if (deepest_delta < delta_obj->delta_depth)
>
> And in the debugger, both of those variables appear to have sane values
> (nor should either impacted by the patch you bisected to). On top of
> that, running with pack.threads=1 produces the same error. So I think it
> may be a false positive from valgrind, and unrelated to your issue.

I find that somewhat unlikely, for two reasons: memcheck is actually
quite good at finding uninitialized memory use, it just isn't that good
at distinguishing if it makes a difference.  Most false positives are of
the "loading an entire word and discarding most of it" kind.

Second, the thread-debugging valgrind tools (drd and helgrind) also
complain about exactly this line:

DRD says:

  ==20987== Thread 4:
  ==20987== Conflicting load by thread 4 at 0x007a70d0 size 4
  ==20987==    at 0x43A783: resolve_delta (index-pack.c:837)
  ==20987==    by 0x43A94F: find_unresolved_deltas (index-pack.c:898)
  ==20987==    by 0x43B0F8: threaded_second_pass (index-pack.c:945)
  ==20987==    by 0x4C2CF60: vgDrd_thread_wrapper (drd_pthread_intercepts.c:341)
  ==20987==    by 0x542FE0D: start_thread (in /lib64/libpthread-2.15.so)
  ==20987== Allocation context: BSS section of /home/thomas/.local/bin/git
  ==20987== Other segment start (thread 2)
  ==20987==    at 0x4C30A1F: pthread_mutex_unlock (drd_pthread_intercepts.c:665)
  ==20987==    by 0x43B06E: threaded_second_pass (index-pack.c:122)
  ==20987==    by 0x4C2CF60: vgDrd_thread_wrapper (drd_pthread_intercepts.c:341)
  ==20987==    by 0x542FE0D: start_thread (in /lib64/libpthread-2.15.so)
  ==20987== Other segment end (thread 2)
  ==20987==    at 0x5436AE3: ??? (in /lib64/libpthread-2.15.so)
  ==20987==    by 0x439C76: unpack_data.constprop.8 (index-pack.c:528)
  ==20987==    by 0x439EA7: get_base_data (index-pack.c:571)
  ==20987==    by 0x43A7B4: resolve_delta (index-pack.c:841)
  ==20987==    by 0x43A94F: find_unresolved_deltas (index-pack.c:898)
  ==20987==    by 0x43B0F8: threaded_second_pass (index-pack.c:945)
  ==20987==    by 0x4C2CF60: vgDrd_thread_wrapper (drd_pthread_intercepts.c:341)
  ==20987==    by 0x542FE0D: start_thread (in /lib64/libpthread-2.15.so)


helgrind says:

  ==21160== Possible data race during read of size 4 at 0x7A70D0 by thread #3
  ==21160== Locks held: none
  ==21160==    at 0x43A783: resolve_delta (index-pack.c:837)
  ==21160==    by 0x43A94F: find_unresolved_deltas (index-pack.c:898)
  ==21160==    by 0x43B0F8: threaded_second_pass (index-pack.c:945)
  ==21160==    by 0x4C2D35D: mythread_wrapper (hg_intercepts.c:219)
  ==21160==    by 0x5424E0D: start_thread (in /lib64/libpthread-2.15.so)
  ==21160== 
  ==21160== This conflicts with a previous write of size 4 by thread #2
  ==21160== Locks held: none
  ==21160==    at 0x43A78E: resolve_delta (index-pack.c:838)
  ==21160==    by 0x43A94F: find_unresolved_deltas (index-pack.c:898)
  ==21160==    by 0x43B0F8: threaded_second_pass (index-pack.c:945)
  ==21160==    by 0x4C2D35D: mythread_wrapper (hg_intercepts.c:219)
  ==21160==    by 0x5424E0D: start_thread (in /lib64/libpthread-2.15.so)


You were apparently just lucky in catching it before it was even
initialized.

I can reproduce the above warnings with

  valgrind --tool=helgrind --trace-children=yes git index-pack <any_pack>

i.e., it does not seem to depend on the pack (sample size 3, packs
looked at were from my git.git).

Duy, what was the reasoning why resolve_delta() does not need to hold
locks when it looks when it looks at deepest_delta?  My coffee levels
aren't up to this task yet.  It certainly seems extremely dubious to me,
as the code uses the global deepest_delta in threaded sections.  You can
probably argue that the load/store is atomic on most(?) platforms, but
you get no guarantees that deepest_delta at any time in fact holds the
maximum value of delta_obj->delta_depth.


Furthermore there's another warning shown by helgrind:

  ==21160== Possible data race during write of size 4 at 0x7A7060 by thread #2
  ==21160== Locks held: 1, at address 0x7A70E0
  ==21160==    at 0x43A840: resolve_delta (index-pack.c:853)
  ==21160==    by 0x43A94F: find_unresolved_deltas (index-pack.c:898)
  ==21160==    by 0x43B0F8: threaded_second_pass (index-pack.c:945)
  ==21160==    by 0x4C2D35D: mythread_wrapper (hg_intercepts.c:219)
  ==21160==    by 0x5424E0D: start_thread (in /lib64/libpthread-2.15.so)
  ==21160== 
  ==21160== This conflicts with a previous read of size 4 by thread #3
  ==21160== Locks held: 1, at address 0x7A7080
  ==21160==    at 0x43AFC8: threaded_second_pass (index-pack.c:955)
  ==21160==    by 0x4C2D35D: mythread_wrapper (hg_intercepts.c:219)
  ==21160==    by 0x5424E0D: start_thread (in /lib64/libpthread-2.15.so)

That one really seems to be a false positive in the sense that
threaded_second_pass() doesn't really care if it gets a bad value for
nr_resolved_deltas.  The only thing that matters is that the counter
increment is done with counter_mutex held.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]