Hi again Trond, I've been looking at a customer report of an oops in rpcauth_lookup_credcache. https://bugzilla.novell.com/show_bug.cgi?id=547137 It appears that one of the hash chains in the cache has become corrupted. In trying to guess how this might happen, the one possibility I have found is that it might be possible for hlist_del_rcu to be called on an rpc_cred that is not hashed. This certainly could cause list corruption. put_rpccred checks if the cred is hashed (RPCAUTH_CRED_HASHED) and if not, it takes a lock and then - if the cred it not uptodate - calls rpcauth_unhash_cred which will call hlist_del_rcu. If some other thread unhashed the cred while this thread was waiting for rpc_credcache_lock, we could get list corruption. The kernel in question did not have commit 5f707eb429 which claims to fix a race in much the same place, however I don't think that patch would fix this problem. I think the simplest fix would be to use hlist_del_init_rcu in place of hlist_del_rcu as shown below. That version protects again deleting the same element multiple times. Do you agree with this fix? Thanks, NeilBrown diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index 54a4e04..323eff6 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -118,7 +118,7 @@ static DEFINE_SPINLOCK(rpc_credcache_lock); static void rpcauth_unhash_cred_locked(struct rpc_cred *cred) { - hlist_del_rcu(&cred->cr_hash); + hlist_del_init_rcu(&cred->cr_hash); smp_mb__before_clear_bit(); clear_bit(RPCAUTH_CRED_HASHED, &cred->cr_flags); } -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html