Re: GFS2 processes getting stuck in WCHAN=dlm_posix_lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dave,

On 11/02/2009 12:11 PM, David Teigland wrote:
On Fri, Oct 30, 2009 at 07:27:23PM -0400, Allen Belletti wrote:
I'll notice the problem when the load average starts rising.  It's
always tied to "stuck" processes, and I believe always tied to IMAP
clients (I'm running Dovecot.)  It seems like a file belonging to user
"x" (in this case, "jforrest" will become locked in some way, such that
every IMAP process tied that user will get stuck on the same thing.
Over time, as the user keeps trying to read that file, more&  more
processes accumulate.  They're always in state "D" (uninterruptible
sleep), and always on "dlm_posix_lock" according to WCHAN.  The only way
I'm able to get out of this state is to reboot.  If I let it persist for
too long, I/O generally stops entirely.
Next time, try to collect all the following information as soon as you can
after the first process gets stuck:

- ps showing pid of stuck/"D" process(es) and WCHAN
- which file they are stuck trying to lock
   (and the inode number of it, you may need to wait until after the
    reboot to use ls -li on the file to get the inode number)
- group_tool dump plocks<fsname>  from all the nodes

I'm guessing that dovecot does some "unusual" combinations of locking,
closing, renaming, unlinking files.  Those combinations are especially
prone to races and bugs that cause posix lock state to get off.
I'll collect all of this as soon as I catch the problem in action again. Do you know how I might go about determine which file is involved? I can find the user because it's associated with the particular "imap" process, but haven't been able to figure out what's being locked.

Thanks,
Allen

--
Allen Belletti
allen@xxxxxxxxxxxxxxx                             404-894-6221 Phone
Industrial and Systems Engineering                404-385-2988 Fax
Georgia Institute of Technology

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux