stuck processes on GFS partition?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We are getting processes stuck in device waits on one file system.  These errors are logged in /var/log/messages:

Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: stuck in gfs_releasepage()...
Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: blkno = 12446334, bh->b_count = 9
Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: bh->b_journal_head = !NULL
Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: gl = (4, 12477424)
Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: bd_new_le.le_trans = NULL
Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: bd_incore_le.le_trans = NULL
Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: bd_frozen = NULL
Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: bd_pinned = 0
Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: bd_ail_tr = NULL
Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip = 12477424/12477424
Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_count = 1, ip->i_vnode = !NULL
Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_arch.i_cache[0] = NULL
Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_arch.i_cache[1] = NULL
Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_arch.i_cache[2] = NULL
Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_arch.i_cache[3] = NULL
Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_arch.i_cache[4] = NULL
Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_arch.i_cache[5] = NULL
Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_arch.i_cache[6] = NULL
Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_arch.i_cache[7] = NULL
Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_arch.i_cache[8] = NULL
Dec 12 10:04:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_arch.i_cache[9] = NULL
Dec 12 10:09:17 imagine su(pam_unix)[5104]: session closed for user root
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: stuck in gfs_releasepage()...
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: blkno = 12446334, bh->b_count = 9
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: bh->b_journal_head = !NULL
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: gl = (4, 12477424)
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: bd_new_le.le_trans = NULL
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: bd_incore_le.le_trans = NULL
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: bd_frozen = NULL
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: bd_pinned = 0
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: bd_ail_tr = NULL
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip = 12477424/12477424
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_count = 1, ip->i_vnode = !NULL
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_arch.i_cache[0] = NULL
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_arch.i_cache[1] = NULL
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_arch.i_cache[2] = NULL
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_arch.i_cache[3] = NULL
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_arch.i_cache[4] = NULL
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_arch.i_cache[5] = NULL
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_arch.i_cache[6] = NULL
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_arch.i_cache[7] = NULL
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_arch.i_cache[8] = NULL
Dec 12 10:14:02 imagine kernel: GFS: fsid=CSM_ACN:admin01.5: ip->i_arch.i_cache[9] = NULL

The file system in question appears to work fine on the other nodes, I unmounted it to be on the safe side.

This is redhat enterprise 3.6, kernel 2.4.21-37.ELsmp, GFS 6.0.2.27-0.  GFS was built from the source.
There are 2 partitions in the admin pool, the second was added a week or so ago.

I tried to unmount it, but the umount failed because of the processes that are stuck in device waits.

Any ideas?

thank you

Matt
mbrookov@xxxxxxxxx


--

Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux