Re: futex wait failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 03 Feb 2010, Helge Deller wrote:

> On 02/03/2010 04:44 AM, John David Anglin wrote:
>> On Tue, 02 Feb 2010, Helge Deller wrote:
>>
>>> I wonder if we have some problems with the LWS code path in the kernel
>>> regarding atomic locking with futexes?
>>>
>>> In arch/parisc/kernel/syscall.S we use a lock table called 
>>> lws_lock_start[]
>>> to guard the LWS code against other competing userspace processes.
>>> I wonder if this really enough, esp. since we do implement futex syscalls
>>> (e.g. clone/exit calls uses futex functions to change userspace values
>>> because of CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID).
>>>
>>> Do we maybe need to protect the LWS code path with the same locking table
>>> as the
>>> generic kernel? Atomicity of futexes writing to userspace are not in sync
>>> with the locking of the LWS/lws_lock_start[] code.
>>>
>>> I tried to come up with a patch for that which I attached here, but sadly
>>> it hangs as soon as the init process is started on a 64bit/SMP kernel.
>>> So either my thinking here is stupid, or I do have a stupid coding bug.
>>>
>>> Furthermore, the coding for futex_atomic_op_inuser() in
>>> arch/parisc/include/asm/futex.h seems to miss real functionality.
>>> I didn't looked closer into this though.
>>
>> While locking may be a problem, it is not the main reason the minifail
>> program fails.  The program fails on my c3750 with a 32-bit UP kernel.
>
> Hmm, I can't reproduce it at the moment with a 32bit UP kernel on my
> c3000. The minifail3 test program I currently use (to avoid glibc issues)
> is attached.
>
>> There is no locking in this kernel.  The LWS code relies on the scheduler
>> for atomicity.
>
> yep. This should then work an a UP kernel.
>
>> I tried disabling interrupts around the crucial three instructions but
>> it didn't help.
>
> Ugh.
>
> Anyway, my current patch which compiles and runs fine is attached here as
> well. Maybe you want to try it on your SMP builds? It includes the 
> syscall.S
> changes you sent last time too.
> In this version of the patch I added on own LWS locking hash table for 
> user-space accesses,
> which is used in the LWS code and when some (probably not all) put_user()/
> get_user() calls are made in the futex code.
>
> Feedback still welcome.

I tried the patch below with 2.6.33 on three systems:

rp3440: 64-bit UP and SMP
A500-7X: 64-bit SMP
c3750: 32-bit UP

libc6 is 2.10.2-6.

As usual, I tested by trying to build and check GCC.

The most common problem on the rp3440 SMP build is a segmentation fault
in /bin/sh.  Twice it dropped core running configure near the start of
the build.  Restarting the build, the system managed to complete a full
GCC build, but the gcc and acats testsuites aborted.  Here's the backtrace
for the acats one:

Core was generated by `/bin/bash /home/dave/gnu/gcc/gcc/gcc/testsuite/ada/acats/run_all.sh'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000008 in ?? ()
(gdb) bt
#0  0x00000008 in ?? ()
#1  0x0004ffd0 in stop_pipeline ()
#2  0x0003daf8 in execute_command_internal ()
#3  0x0003e3a8 in execute_command_internal ()
#4  0x0003e74c in execute_command ()
#5  0x0003e364 in execute_command_internal ()
#6  0x0003e74c in execute_command ()
#7  0x0003e364 in execute_command_internal ()
#8  0x0003e74c in execute_command ()
#9  0x0003e0a0 in execute_command_internal ()
#10 0x0003e3a8 in execute_command_internal ()
#11 0x0003e74c in execute_command ()
#12 0x0003e0a0 in execute_command_internal ()
#13 0x0003e74c in execute_command ()
#14 0x00029ac0 in reader_loop ()
#15 0x00029100 in main ()

I ran the UP build for several days without any major problems.  The most
common problem is tests randomly timeout.  I'm not sure whether the tests
actually timeout or not, but everyone that I've looked at shouldn't timeout
based on execution time.

I had pretty much the same experience on the A500-7X although I haven't
seen /bin/sh dump core running configure.

The c3750 completed a full GCC build but there were again problems running
the testsuite.  For example,

Core was generated by `expect -- /usr/share/dejagnu/runtest.exp --tool gcc'.
Program terminated with signal 11, Segmentation fault.
#0  0x409fa258 in ?? () from /usr/lib/libtcl8.5.so.0
(gdb) bt
#0  0x409fa258 in ?? () from /usr/lib/libtcl8.5.so.0
#1  0x402600a0 in start_thread (arg=0x416a4480) at pthread_create.c:302
#2  0x40e25874 in clone ()
    at ../ports/sysdeps/unix/sysv/linux/hppa/nptl/../clone.S:166
#3  0x00000000 in ?? ()

This looks like the minifail clone bug.  On the otherhand, minifail doesn't
seem to be failing.

This is the assembly code:

(gdb) disass 0x409fa248 0x409fa268
Dump of assembler code from 0x409fa248 to 0x409fa268:
0x409fa248:     copy r18,ret0
0x409fa24c:     stw r0,0(ret0)
0x409fa250:     ldo 4(ret0),ret0
0x409fa254:     cmpb,<>,n r7,ret0,0x409fa250
0x409fa258:     stw r0,0(ret0)
0x409fa25c:     copy r7,ret0
0x409fa260:     stw r0,0(ret0)
0x409fa264:     ldo 4(ret0),ret0

This may be the register.  There are several dumps in the log at the same spot.

Mar  7 11:48:18 hiauly6 kernel: do_page_fault() pid=3506 command='expect' type=1
5 address=0x416a3000
Mar  7 11:48:18 hiauly6 kernel: vm_start = 0x416a3000, vm_end = 0x416a4000
Mar  7 11:48:18 hiauly6 kernel: 
Mar  7 11:48:18 hiauly6 kernel:      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
Mar  7 11:48:18 hiauly6 kernel: PSW: 00000000000011101111111100001111 Not tainte
d
Mar  7 11:48:18 hiauly6 kernel: r00-03  000eff0f 40276138 409fa4ff 00000000
Mar  7 11:48:18 hiauly6 kernel: r04-07  40a252f4 40ea5208 0001ccb8 00000000
Mar  7 11:48:18 hiauly6 kernel: r08-11  40a252f4 40ea539c 00000003 40a2990c
Mar  7 11:48:18 hiauly6 kernel: r12-15  40ea521c 40a29904 40ea5214 00000001
Mar  7 11:48:18 hiauly6 kernel: r16-19  00000008 40a232d8 40ea521c 40a252f4
Mar  7 11:48:18 hiauly6 kernel: r20-23  00000001 00000000 402636cc 00000000
Mar  7 11:48:18 hiauly6 kernel: r24-27  fffffff5 ffffffd3 0001c538 000121ac
Mar  7 11:48:18 hiauly6 kernel: r28-31  416a3000 00000000 40ea5440 40263733
Mar  7 11:48:18 hiauly6 kernel: sr00-03  00000026 00000000 00000000 00000026
Mar  7 11:48:18 hiauly6 kernel: sr04-07  00000026 00000026 00000026 00000026
Mar  7 11:48:18 hiauly6 kernel: 
Mar  7 11:48:18 hiauly6 kernel:       VZOUICununcqcqcqcqcqcrmunTDVZOUI
Mar  7 11:48:18 hiauly6 kernel: FPSR: 00001100001110000000000000000000
Mar  7 11:48:18 hiauly6 kernel: FPER1: 00000000
Mar  7 11:48:18 hiauly6 kernel: fr00-03  0c38000000000000 0000000000000000 00000
00000000000 0000000000000000
Mar  7 11:48:18 hiauly6 kernel: fr04-07  0000000000000001 0370000000000000 0000000100000000 bff0000000000000
Mar  7 11:48:18 hiauly6 kernel: fr08-11  000000024f813bc4 0000000000000000 4f813bc04f813bc8 0000800100000003
Mar  7 11:48:18 hiauly6 kernel: fr12-15  000000004facfd40 000000001019db40 4f81169cf0000174 f000017cf0000884
Mar  7 11:48:18 hiauly6 kernel: fr16-19  1044111810590240 1044111800000081 f000000000000000 0000000000000000
Mar  7 11:48:18 hiauly6 kernel: fr20-23  00000002ffffff9c fffff0004f1a4000 0000000000000088 400026c000000000
Mar  7 11:48:18 hiauly6 kernel: fr24-27  01f2603af000017c f000017400000001 105a928c00000000 4fb50230ffffffff
Mar  7 11:48:18 hiauly6 kernel: fr28-31  ffffffff00002b67 105a92881019487c 0000000100000000 4f82820000000800
Mar  7 11:48:18 hiauly6 kernel: 
Mar  7 11:48:18 hiauly6 kernel: IASQ: 00000026 00000026 IAOQ: 409fa25b 409fa253
Mar  7 11:48:18 hiauly6 kernel:  IIR: 0f801280    ISR: 00000026  IOR: 416a3000
Mar  7 11:48:18 hiauly6 kernel:  CPU:        0   CR30: 3c014000 CR31: ffffffff
Mar  7 11:48:18 hiauly6 kernel:  ORIG_R28: 00000080
Mar  7 11:48:18 hiauly6 kernel:  IAOQ[0]: 409fa25b
Mar  7 11:48:18 hiauly6 kernel:  IAOQ[1]: 409fa253
Mar  7 11:48:18 hiauly6 kernel:  RP(r2): 409fa4ff

The address appears valid.  This is a TLB miss exception (15).

> diff --git a/arch/parisc/include/asm/atomic.h b/arch/parisc/include/asm/atomic.h
> index 716634d..18d57c8 100644
> --- a/arch/parisc/include/asm/atomic.h
> +++ b/arch/parisc/include/asm/atomic.h
> @@ -24,29 +24,46 @@
>   * Hash function to index into a different SPINLOCK.
>   * Since "a" is usually an address, use one spinlock per cacheline.
>   */
> -#  define ATOMIC_HASH_SIZE 4
> -#  define ATOMIC_HASH(a) (&(__atomic_hash[ (((unsigned long) (a))/L1_CACHE_BYTES) & (ATOMIC_HASH_SIZE-1) ]))
> +#  define ATOMIC_HASH_SIZE (4096/L1_CACHE_BYTES)  /* 4 */
> +#  define ATOMIC_HASH(a)      (&(__atomic_hash[ (((unsigned long) (a))/L1_CACHE_BYTES) & (ATOMIC_HASH_SIZE-1) ]))
> +#  define ATOMIC_USER_HASH(a) (&(__atomic_user_hash[ (((unsigned long) (a))/L1_CACHE_BYTES) & (ATOMIC_HASH_SIZE-1) ]))
>  
>  extern arch_spinlock_t __atomic_hash[ATOMIC_HASH_SIZE] __lock_aligned;
> +extern arch_spinlock_t __atomic_user_hash[ATOMIC_HASH_SIZE] __lock_aligned;
>  
>  /* Can't use raw_spin_lock_irq because of #include problems, so
>   * this is the substitute */
> -#define _atomic_spin_lock_irqsave(l,f) do {	\
> -	arch_spinlock_t *s = ATOMIC_HASH(l);		\
> +#define _atomic_spin_lock_irqsave_template(l,f,hash_func) do {	\
> +	arch_spinlock_t *s = hash_func;		\
>  	local_irq_save(f);			\
>  	arch_spin_lock(s);			\
>  } while(0)
>  
> -#define _atomic_spin_unlock_irqrestore(l,f) do {	\
> -	arch_spinlock_t *s = ATOMIC_HASH(l);			\
> +#define _atomic_spin_unlock_irqrestore_template(l,f,hash_func) do {	\
> +	arch_spinlock_t *s = hash_func;			\
>  	arch_spin_unlock(s);				\
>  	local_irq_restore(f);				\
>  } while(0)
>  
> +/* kernel memory locks */
> +#define _atomic_spin_lock_irqsave(l,f)	\
> +	_atomic_spin_lock_irqsave_template(l,f,ATOMIC_HASH(l))
> +
> +#define _atomic_spin_unlock_irqrestore(l,f)	\
> +	_atomic_spin_unlock_irqrestore_template(l,f,ATOMIC_HASH(l))
> +
> +/* userspace memory locks */
> +#define _atomic_spin_lock_irqsave_user(l,f)	\
> +	_atomic_spin_lock_irqsave_template(l,f,ATOMIC_USER_HASH(l))
> +
> +#define _atomic_spin_unlock_irqrestore_user(l,f)	\
> +	_atomic_spin_unlock_irqrestore_template(l,f,ATOMIC_USER_HASH(l))
>  
>  #else
>  #  define _atomic_spin_lock_irqsave(l,f) do { local_irq_save(f); } while (0)
>  #  define _atomic_spin_unlock_irqrestore(l,f) do { local_irq_restore(f); } while (0)
> +#  define _atomic_spin_lock_irqsave_user(l,f) _atomic_spin_lock_irqsave(l,f)
> +#  define _atomic_spin_unlock_irqrestore_user(l,f) _atomic_spin_lock_irqsave_user(l,f)
>  #endif
>  
>  /* This should get optimized out since it's never called.
> diff --git a/arch/parisc/include/asm/futex.h b/arch/parisc/include/asm/futex.h
> index 0c705c3..7bc963e 100644
> --- a/arch/parisc/include/asm/futex.h
> +++ b/arch/parisc/include/asm/futex.h
> @@ -55,6 +55,7 @@ futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
>  {
>  	int err = 0;
>  	int uval;
> +	unsigned long flags;
>  
>  	/* futex.c wants to do a cmpxchg_inatomic on kernel NULL, which is
>  	 * our gateway page, and causes no end of trouble...
> @@ -65,10 +66,15 @@ futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
>  	if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int)))
>  		return -EFAULT;
>  
> +	_atomic_spin_lock_irqsave_user(uaddr, flags);
> +
>  	err = get_user(uval, uaddr);
> -	if (err) return -EFAULT;
> -	if (uval == oldval)
> -		err = put_user(newval, uaddr);
> +	if (!err)
> +		if (uval == oldval)
> +			err = put_user(newval, uaddr);
> +
> +	_atomic_spin_unlock_irqrestore_user(uaddr, flags);
> +
>  	if (err) return -EFAULT;
>  	return uval;
>  }
> diff --git a/arch/parisc/include/asm/system.h b/arch/parisc/include/asm/system.h
> index d91357b..4653c77 100644
> --- a/arch/parisc/include/asm/system.h
> +++ b/arch/parisc/include/asm/system.h
> @@ -160,7 +160,7 @@ static inline void set_eiem(unsigned long val)
>     ldcd). */
>  
>  #define __PA_LDCW_ALIGNMENT	4
> -#define __ldcw_align(a) ((volatile unsigned int *)a)
> +#define __ldcw_align(a) (&(a)->slock)
>  #define __LDCW	"ldcw,co"
>  
>  #endif /*!CONFIG_PA20*/
> diff --git a/arch/parisc/kernel/asm-offsets.c b/arch/parisc/kernel/asm-offsets.c
> index ec787b4..50353bd 100644
> --- a/arch/parisc/kernel/asm-offsets.c
> +++ b/arch/parisc/kernel/asm-offsets.c
> @@ -290,5 +290,11 @@ int main(void)
>  	BLANK();
>  	DEFINE(ASM_PDC_RESULT_SIZE, NUM_PDC_RESULT * sizeof(unsigned long));
>  	BLANK();
> +
> +#ifdef CONFIG_SMP
> +	DEFINE(ASM_ATOMIC_HASH_SIZE_SHIFT, __builtin_ffs(ATOMIC_HASH_SIZE)-1);
> +	DEFINE(ASM_ATOMIC_HASH_ENTRY_SHIFT, __builtin_ffs(sizeof(__atomic_hash[0]))-1);
> +#endif
> +
>  	return 0;
>  }
> diff --git a/arch/parisc/kernel/setup.c b/arch/parisc/kernel/setup.c
> index cb71f3d..878f42c 100644
> --- a/arch/parisc/kernel/setup.c
> +++ b/arch/parisc/kernel/setup.c
> @@ -128,6 +131,14 @@ void __init setup_arch(char **cmdline_p)
>  	printk(KERN_INFO "The 32-bit Kernel has started...\n");
>  #endif
>  
> +	/* Consistency check on the size and alignments of our spinlocks */
> +#ifdef CONFIG_SMP
> +	BUILD_BUG_ON(sizeof(arch_spinlock_t) != __PA_LDCW_ALIGNMENT);
> +	BUG_ON((unsigned long)&__atomic_hash[0] & (__PA_LDCW_ALIGNMENT-1));
> +	BUG_ON((unsigned long)&__atomic_hash[1] & (__PA_LDCW_ALIGNMENT-1));
> +#endif
> +	BUILD_BUG_ON((1<<L1_CACHE_SHIFT) != L1_CACHE_BYTES);
> +
>  	pdc_console_init();
>  
>  #ifdef CONFIG_64BIT
> diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S
> index f5f9602..1f12418 100644
> --- a/arch/parisc/kernel/syscall.S
> +++ b/arch/parisc/kernel/syscall.S
> @@ -11,6 +11,7 @@
>  #include <asm/unistd.h>
>  #include <asm/errno.h>
>  #include <asm/page.h>
> +#include <asm/cache.h>
>  #include <asm/psw.h>
>  #include <asm/thread_info.h>
>  #include <asm/assembly.h>
> @@ -47,18 +48,17 @@ ENTRY(linux_gateway_page)
>  	KILL_INSN
>  	.endr
>  
> -	/* ADDRESS 0xb0 to 0xb4, lws uses 1 insns for entry */
> +	/* ADDRESS 0xb0 to 0xb8, lws uses two insns for entry */
>  	/* Light-weight-syscall entry must always be located at 0xb0 */
>  	/* WARNING: Keep this number updated with table size changes */
>  #define __NR_lws_entries (2)
>  
>  lws_entry:
> -	/* Unconditional branch to lws_start, located on the 
> -	   same gateway page */
> -	b,n	lws_start
> +	gate	lws_start, %r0		/* increase privilege */
> +	depi	3, 31, 2, %r31		/* Ensure we return into user mode. */
>  
> -	/* Fill from 0xb4 to 0xe0 */
> -	.rept 11
> +	/* Fill from 0xb8 to 0xe0 */
> +	.rept 10
>  	KILL_INSN
>  	.endr
>  
> @@ -423,9 +423,6 @@ tracesys_sigexit:
>  
>  	*********************************************************/
>  lws_start:
> -	/* Gate and ensure we return to userspace */
> -	gate	.+8, %r0
> -	depi	3, 31, 2, %r31	/* Ensure we return to userspace */
>  
>  #ifdef CONFIG_64BIT
>  	/* FIXME: If we are a 64-bit kernel just
> @@ -473,7 +470,7 @@ lws_exit:
>  	/* now reset the lowest bit of sp if it was set */
>  	xor	%r30,%r1,%r30
>  #endif
> -	be,n	0(%sr3, %r31)
> +	be,n	0(%sr7, %r31)
>  
>  
>  	
> @@ -530,18 +527,17 @@ lws_compare_and_swap32:
>  
>  lws_compare_and_swap:
>  #ifdef CONFIG_SMP
> -	/* Load start of lock table */
> -	ldil	L%lws_lock_start, %r20
> -	ldo	R%lws_lock_start(%r20), %r28
> +	/* Calculate lock table entry via ATOMIC_HASH(%r26) */
> +	ldil	L%__atomic_user_hash, %r20
> +	ldo	R%__atomic_user_hash(%r20), %r28
>  
> -	/* Extract four bits from r26 and hash lock (Bits 4-7) */
> -	extru  %r26, 27, 4, %r20
> +#ifdef CONFIG_64BIT
> +	extrd,u %r26, 63-L1_CACHE_SHIFT, ASM_ATOMIC_HASH_SIZE_SHIFT, %r20
> +#else
> +	extru	%r26, 31-L1_CACHE_SHIFT, ASM_ATOMIC_HASH_SIZE_SHIFT, %r20
> +#endif
> +	shladd,l %r20, ASM_ATOMIC_HASH_ENTRY_SHIFT, %r28, %r20
>  
> -	/* Find lock to use, the hash is either one of 0 to
> -	   15, multiplied by 16 (keep it 16-byte aligned)
> -	   and add to the lock table offset. */
> -	shlw	%r20, 4, %r20
> -	add	%r20, %r28, %r20
>  
>  # if ENABLE_LWS_DEBUG
>  	/*	
> @@ -672,31 +668,6 @@ ENTRY(sys_call_table64)
>  END(sys_call_table64)
>  #endif
>  
> -#ifdef CONFIG_SMP
> -	/*
> -		All light-weight-syscall atomic operations 
> -		will use this set of locks 
> -
> -		NOTE: The lws_lock_start symbol must be
> -		at least 16-byte aligned for safe use
> -		with ldcw.
> -	*/
> -	.section .data
> -	.align	PAGE_SIZE
> -ENTRY(lws_lock_start)
> -	/* lws locks */
> -	.rept 16
> -	/* Keep locks aligned at 16-bytes */
> -	.word 1
> -	.word 0 
> -	.word 0
> -	.word 0
> -	.endr
> -END(lws_lock_start)
> -	.previous
> -#endif
> -/* CONFIG_SMP for lws_lock_start */
> -
>  .end
>  
>  
> diff --git a/arch/parisc/lib/bitops.c b/arch/parisc/lib/bitops.c
> index 353963d..bae6a86 100644
> --- a/arch/parisc/lib/bitops.c
> +++ b/arch/parisc/lib/bitops.c
> @@ -15,6 +15,9 @@
>  arch_spinlock_t __atomic_hash[ATOMIC_HASH_SIZE] __lock_aligned = {
>  	[0 ... (ATOMIC_HASH_SIZE-1)]  = __ARCH_SPIN_LOCK_UNLOCKED
>  };
> +arch_spinlock_t __atomic_user_hash[ATOMIC_HASH_SIZE] __lock_aligned = {
> +	[0 ... (ATOMIC_HASH_SIZE-1)]  = __ARCH_SPIN_LOCK_UNLOCKED
> +};
>  #endif
>  
>  #ifdef CONFIG_64BIT
> diff --git a/kernel/fork.c b/kernel/fork.c
> index f88bd98..108b1ed 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -608,7 +608,10 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>  			 * We don't check the error code - if userspace has
>  			 * not set up a proper pointer then tough luck.
>  			 */
> +			unsigned long flags;
> +			_atomic_spin_lock_irqsave_user(tsk->clear_child_tid, flags);
>  			put_user(0, tsk->clear_child_tid);
> +			_atomic_spin_unlock_irqrestore_user(tsk->clear_child_tid, flags);
>  			sys_futex(tsk->clear_child_tid, FUTEX_WAKE,
>  					1, NULL, NULL, 0);
>  		}
> @@ -1432,8 +1435,12 @@ long do_fork(unsigned long clone_flags,
>  
>  		nr = task_pid_vnr(p);
>  
> -		if (clone_flags & CLONE_PARENT_SETTID)
> +		if (clone_flags & CLONE_PARENT_SETTID) {
> +			unsigned long flags;
> +			_atomic_spin_lock_irqsave_user(parent_tidptr, flags);
>  			put_user(nr, parent_tidptr);
> +			_atomic_spin_unlock_irqrestore_user(parent_tidptr, flags);
> +		}
>  
>  		if (clone_flags & CLONE_VFORK) {
>  			p->vfork_done = &vfork;


-- 
J. David Anglin                                  dave.anglin@xxxxxxxxxxxxxx
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux