Re: Problems with GFS2 faulting.

Eric Renfro <erenfro@xxxxxxxxx> · Thu, 13 Jan 2011 10:41:44 -0500

It's not RHEL, as I stated in my post. It's Ubuntu 10.04.1 with the 
ubuntu-ha-maintainer PPA for pacemaker 1.0.8, open-iscsi 2.0.871, and 
gfs2-tools 3.0.7.

I have, also stated, tried without multipathed iSCSI and just used a 
singular iSCSI target for the nodes having problems, with the same 
situation. The issue is strictly with GFS2 somehow after locking and 
unlocking files. In fact, it can't be iSCSI at all, because the root 
filesystem of both nodes are iSCSI targets provided by kvm on the host 
OS, and they have given no issues as a result to iSCSI related issues. 
If it would be caused by iSCSI blocking, it would happen to the root 
filesystem as well I'm sure.

Eric Renfro

On 1/13/2011 10:13 AM, Gordan Bobic wrote:
Eric Renfro wrote:

Here's the stack traces I'm getting when it faults:

Jan 13 03:31:27 cweb1 kernel: [1387920.160141] INFO: task 
flush-251:1:27497 blocked for more than 120 seconds.
Jan 13 03:31:27 cweb1 kernel: [1387920.160802] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.

As the error says, this isn't an actual kernel ooops, it means that 
something in your stack (likely the iSCSI implementation since that is 
the fixed thing throughout what you tested) is blocking somewhere.

What version of RHEL, gfs2 and iscsi are you using? My guess is that 
iscsi might be getting into a race somewhere and locks up. Have you 
tried connecting both clients to just a single server via iscsi to see 
if the problem goes away?

Gordan

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster