gcc-4.8+ and R10000+

Joshua Kinard <kumba@xxxxxxxxxx> · Sun, 07 Sep 2014 04:25:03 -0400

I've been banging my head on the desk over gcc PR61538 [1] the last few
months, and talking to the gcc people, I went looking through the R10000
manual again to try and see if some kind of errata sticks out.  I found this
bit:

"""
Load Linked and Store Conditional instructions (LL, LLD,
SC, and SCD) do not implicitly perform SYNC operations in
the R10000.  Any of the following events that occur between
a Load Linked and a Store Conditional will cause the Store
Conditional to fail: an exception; execution of an ERET,
a load, a store, a SYNC, a CacheOp, a prefetch, or an
external intervention/invalidation on the block containing
the linked address. Instruction cache misses do not cause
the Store Conditional to fail.
"""

The regression happens inside glibc's __lll_lock_wait_private routine:

void
__lll_lock_wait_private (int *futex)
{
  if (*futex == 2)
    lll_futex_wait (futex, 2, LLL_PRIVATE);

  while (atomic_exchange_acq (futex, 2) != 0)
    lll_futex_wait (futex, 2, LLL_PRIVATE);
}

It appears to hang forever on the "atomic_exchange_acq" function call.

Disassembling a statically-built copy of the "sln" binary generated by
glibc's compile phase, there are slight differences in how gcc-4.7 and
gcc-4.8 are compiling the __lll_lock_wait_private function.  The key
differences in the output asm are
this:

gcc-4.7:
   x+4   <START>
         ...
   x+24  bne     v1,v0,<x+56>
         ...
   x+32  0x7c03e83b /* rdhwr */
   x+36  li      a2,2
   x+40  lw      a1,-29832(v1)
   x+44  move    a3,zero
   x+48  li      v0,4238
   x+52  syscall
*  x+56  li      v0,2
*  x+60  ll      v1,0(s0)
*  x+64  move    a0,v0
*  x+68  sc      a0,0(s0)
   x+72  beqzl   a0,<x+56>
   x+76  nop
   x+80  sync
   x+84  bnez    v1,<x+32>

gcc-4.8:
   x+4   <START>
         ...
   x+24  bne     v1,v0,<x+56>
         ...
   x+32  0x7c03e83b /* rdhwr */
   x+36  li      a2,2
   x+40  lw      a1,-29832(v1)
   x+44  move    a3,zero
   x+48  li      v0,4238
   x+52  syscall
*  x+56  ll      v0,0(s0)
*  x+60  li      at,2
*  x+64  sc      at,0
   x+68  beqzl   at,<x+56>
   x+72  nop
   x+76  sync
   x+80  bnez    v0,<x+32>

Using gdb, if I step through 'sln', the gcc-4.7 copy never calls
__lll_lock_wait_private, so I have no idea how the insns are being executed.
 But the 4.8 copy does get into this function, and stepping each instruction
at a time yields this execution path:

   x+4   <START>
         ...
   x+24  bne     v1,v0,<x+56>
   x+56  ll      v0,0(s0)
   x+68  beqzl   at,<x+56> /* beqzl check fails -> x+76 */
   x+76  sync
   x+80  bnez    v0,<x+32>
   x+32  0x7c03e83b /* rdhwr */
   x+36  li      a2,2
   x+40  lw      a1,-29832(v1)
   x+44  move    a3,zero
   x+48  li      v0,4238
   x+52  syscall
   x+56  ll      v0,0(s0)
   <HANG>

Executing the 'bnez' insn puts us at the rdhwr insn (x+32), then stepping
through, the 'syscall' (x+56) returns and leaves us at the 'll' a second
time, where the program just hangs.

I am guessing at a few things here:

- Because ll/sc are atomic, gdb doesn't let you step through them, which is
why the instruction pointer jumps over the 'li' and 'sc' insns.

- The 'li' after 'll' triggers the 'sc' to fail on R10K.

Does this look correct for an R10000, given the above statement from the
manual?  I'm not sure how or why this would cause the program to hang, but
it seems to directly correlate.

Anyone from Debian able to test building gcc-4.8 (or greater) and glibc-2.19
on an R10K system and see if it hangs at the end of glibc's compile phase
using the 'sln' binary to generate symlinks?  I've ran into this on R12000
and R14000 systems.  I am assuming it'll happen on an R10000 system as well.

1: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61538

-- 
Joshua Kinard
Gentoo/MIPS
kumba@xxxxxxxxxx
4096R/D25D95E3 2011-03-28

"The past tempts us, the present confuses us, the future frightens us.  And
our lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic