> -----Original Message----- > From: Ben Greear [mailto:greearb@xxxxxxxxxxxxxxx] > Sent: Tuesday, July 12, 2011 1:15 PM > To: Myklebust, Trond > Cc: linux-nfs@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx > Subject: Re: [RFC] sunrpc: Fix race between work-queue and > rpc_killall_tasks. > > On 07/08/2011 03:14 PM, Myklebust, Trond wrote: > > >> [<ffffffff81105907>] print_trailer+0x131/0x13a > >> [<ffffffff81105945>] object_err+0x35/0x3e > >> [<ffffffff811077b3>] verify_mem_not_deleted+0x7a/0xb7 > >> [<ffffffffa02891e5>] rpcb_getport_done+0x23/0x126 [sunrpc] > >> [<ffffffffa02810df>] rpc_exit_task+0x3f/0x6d [sunrpc] > >> [<ffffffffa02814d8>] __rpc_execute+0x80/0x253 [sunrpc] > >> [<ffffffffa02816ed>] ? rpc_execute+0x42/0x42 [sunrpc] > >> [<ffffffffa02816fd>] rpc_async_schedule+0x10/0x12 [sunrpc] > >> [<ffffffff81061343>] process_one_work+0x230/0x41d > >> [<ffffffff8106128e>] ? process_one_work+0x17b/0x41d > >> [<ffffffff8106379f>] worker_thread+0x133/0x217 > >> [<ffffffff8106366c>] ? manage_workers+0x191/0x191 > >> [<ffffffff81066f9c>] kthread+0x7d/0x85 > >> [<ffffffff81485ee4>] kernel_thread_helper+0x4/0x10 > >> [<ffffffff8147f0d8>] ? retint_restore_args+0x13/0x13 > >> [<ffffffff81066f1f>] ? __init_kthread_worker+0x56/0x56 > >> [<ffffffff81485ee0>] ? gs_change+0x13/0x13 > > > > The calldata gets freed in the rpc_final_put_task() which shouldn't > ever be run while the task is still referenced in __rpc_execute > > > > IOW: it should be impossible to call rpc_exit_task() after > rpc_final_put_task > > I added lots of locking around the calldata, work-queue logic, and > such, and > still the problem persists w/out hitting any of the debug warnings or > poisoned > values I put in. It almost seems like tk_calldata is just assigned to > two > different tasks. > > While poking through the code, I noticed that 'map' is static in > rpcb_getport_async. > > That would seem to cause problems if two threads called this method at > the same time, possibly causing tk_calldata to be assigned to two > different > tasks??? > > Any idea why it is static? Doh! That is clearly a typo dating all the way back to when Chuck wrote that function. Yes, that would definitely explain your problem. Cheers Trond ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥