On Tue, Mar 1, 2016 at 12:00 AM, Eryu Guan <eguan@xxxxxxxxxx> wrote: > On Mon, Feb 29, 2016 at 10:22:06AM -0800, Dan Williams wrote: >> On Sat, Feb 27, 2016 at 9:31 PM, Eryu Guan <eguan@xxxxxxxxxx> wrote: >> > On Sat, Feb 27, 2016 at 12:10:51PM -0800, Dan Williams wrote: >> >> On Sat, Feb 27, 2016 at 5:02 AM, Eryu Guan <eguan@xxxxxxxxxx> wrote: >> >> > Hi, >> >> > >> >> > Starting from 4.5-rc1 kernel, I sometimes see generic/320 triggers >> >> > "list_add attempted on force-poisoned entry" warnings on XFS, test hosts >> >> > are arm64/ppc64/ppc64le, haven't seen it on x86_64 hosts. >> >> >> >> Hmm, this triggers when a list_head has ->next or ->prev pointing at >> >> the address of force_poison which is only defined in lib/list_debug.c. >> >> The only call site that uses list_force_poison() is in >> >> devm_memremap_pages(). That currently depends on CONFIG_ZONE_DEVICE >> >> which in turn depends on X86_64. >> >> >> >> So, this appears to be a false positive and the address of >> >> force_poison is somehow ending up on the stack by accident as that is >> >> the random value being passed in from __down_common: >> >> >> >> struct semaphore_waiter waiter; >> >> >> >> list_add_tail(&waiter.list, &sem->wait_list); >> >> >> >> So, I think we need a more unique poison value that should never >> >> appear on the stack: >> > >> > Unfortunately I can still see the warning after applying this test patch. >> > >> > Then I added debug code to print the pointer value and re-ran the test. >> > All five failures printed the same pointer value, failed in the same >> > pattern: >> > >> > list_add attempted on force-poisoned entry(0000000000000500), new->next = c00000000136bc00, new->prev = 0000000000000500 >> > >> >> I think this means that no matter what we do the stack will pick up >> these poison values unless the list_head is explicitly initialized. >> Something like the following: > > Umm, it's still reproducible... but seems harder than before, it took me > 200+ iterations to hit (less than 10 iterations in previous runs) Similar fix, just in rwsem_down_read_failed() this time: diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index a4d4de05b2d1..68678a20da52 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -214,8 +214,10 @@ __visible struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem) { long count, adjustment = -RWSEM_ACTIVE_READ_BIAS; - struct rwsem_waiter waiter; struct task_struct *tsk = current; + struct rwsem_waiter waiter = { + .list = LIST_HEAD_INIT(waiter.list), + }; /* set up my own style of waitqueue */ waiter.task = tsk; _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs