On 07/12/2011 10:30 AM, Ben Greear wrote:
On 07/12/2011 10:25 AM, Myklebust, Trond wrote:
-----Original Message-----
From: Ben Greear [mailto:greearb@xxxxxxxxxxxxxxx]
Sent: Tuesday, July 12, 2011 1:15 PM
To: Myklebust, Trond
Cc: linux-nfs@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
Subject: Re: [RFC] sunrpc: Fix race between work-queue and
rpc_killall_tasks.
I added lots of locking around the calldata, work-queue logic, and
such, and
still the problem persists w/out hitting any of the debug warnings or
poisoned
values I put in. It almost seems like tk_calldata is just assigned to
two
different tasks.
While poking through the code, I noticed that 'map' is static in
rpcb_getport_async.
That would seem to cause problems if two threads called this method at
the same time, possibly causing tk_calldata to be assigned to two
different
tasks???
Any idea why it is static?
Doh! That is clearly a typo dating all the way back to when Chuck
wrote that function.
Yes, that would definitely explain your problem.
Ok, patch sent. I assume someone will propagate this to stable
as desired?
And assuming this fixes it, can I get some brownie points towards
review of the ip-addr binding patches? :)
Just to close this issue: We ran a clean 24+ hour test mounting and
unmounting 200 mounts every 30 seconds, and it ran with zero problems.
This was with 2.6.38.8+ with this fix applied.
3.0-rc7+ is still flaky in various other ways, but I see no more
NFS problems at least.
So, that was the problem I was hitting, and it appears to be the
last problem in this area.
Thanks,
Ben
--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html