Re: NFS intr/nointr: SIGKILL may leave pages locked forever

Chuck Lever <chuck.lever@xxxxxxxxxx> · Wed, 29 Oct 2008 12:38:03 -0400

On Oct 24, 2008, at 11:11 AM, Chuck Lever wrote:
Hi Matthew-

We are still trying to pursue intr/nointr testing on 2.6.25+  
kernels.  Looks like this week's kernel version is 2.6.27-rc7, but I  
will need to confirm that.

Since 2.6.25, the problem is the "sql shutdown abort" command, which  
is designed to trigger an immediate database shutdown, causes the  
database instance to hang.  It leaves database writer processes  
stuck in "D" state after it sends a SIGKILL.

The process backtraces suggest that these processes are waiting for  
the inode mutex before trying to invalidate the database file's  
cache (nfs_invalidate_mapping).  There is one process that owns the  
mutex and is stuck waiting for a page lock in  
invalidate_inode_pages2_range.  This suggests that the signal is  
causing some other code path to neglect to unlock that page.

It's a little out of my league.  Are there ways we can gather more  
information?

As a follow-up, we've found that we don't have this problem on UP NFS  
clients.  On single processor clients, SIGKILL works correctly and the  
database shuts down without corrupting its data files.  On SMP  
clients, the signal results in hung database writers.

We've confirmed this difference on 2.6.25-rc2 and 2.6.27-rc7.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html