Re: GFS2 and D state HTTPD processes

Emilio Arjona <emilio.ah@xxxxxxxxx> · Tue, 27 Apr 2010 13:58:39 +0200

Thanks Ricardo,

We don't want to update the server because it's in production. We will plan a system update in summer when system's load is low. 

In the last incidents there is a new process involved: [delete_workqueu]. Now, it is usually the initiator of the D-state processes lockout. I have been looking for information about this process but couldn't find out anything.

Any idea?

Regards :)

2010/4/9 Ricardo Argüello <ricardo@xxxxxxxxxxxxxxxxx>

Looks like this bug:

GFS2 - probably lost glock call back

https://bugzilla.redhat.com/show_bug.cgi?id=498976

This is fixed in the kernel included in RHEL 5.5.

Do a "yum update" to fix it.

Ricardo Arguello

On Tue, Mar 2, 2010 at 6:10 AM, Emilio Arjona <emilio.ah@xxxxxxxxx> wrote:

> Thanks for your response, Steve.

>

> 2010/3/2 Steven Whitehouse <swhiteho@xxxxxxxxxx>:

>> Hi,

>>

>> On Fri, 2010-02-26 at 16:52 +0100, Emilio Arjona wrote:

>>> Hi,

>>>

>>> we are experiencing some problems commented in an old thread:

>>>

>>> http://www.mail-archive.com/linux-cluster@xxxxxxxxxx/msg07091.html

>>>

>>> We have 3 clustered servers under Red Hat 5.4 accessing a GFS2 resource.

>>>

>>> fstab options:

>>> /dev/vg_cluster/lv_cluster /opt/datacluster gfs2

>>> defaults,noatime,nodiratime,noquota 0 0

>>>

>>> GFS options:

>>> plock_rate_limit="0"

>>> plock_ownership=1

>>>

>>> httpd processes run into D status sometimes and the only solution is

>>> hard reset the affected server.

>>>

>>> Can anyone give me some hints to diagnose the problem?

>>>

>>> Thanks :)

>>>

>> Can you give me a rough idea of what the actual workload is and how it

>> is distributed amoung the director(y/ies) ?

>

> We had problems with php sessions in the past but we fixed it by

> configuring php to store the sessions in the database instead of in

> the GFS filesystem. Now, we're having problems with files and

> directories in the "data" folder of Moodle LMS.

>

> "lsof -p" returned a i/o operation over the same folder in 2/3 nodes,

> we did a hard reset of these nodes but some hours after the CPU load

> grew up again, specially in the node that wasn't rebooted. We decided

> to reboot (vía ssh) this node, then the CPU load went down to normal

> values in all nodes.

>

> I don't think the system's load is high enough to produce concurrent

> access problems. It's more likely to be some misconfiguration, in

> fact, we changed some GFS2 options to non default values to increase

> performance (http://www.linuxdynasty.org/howto-increase-gfs2-performance-in-a-cluster.html).

>

>>

>> This is often down to contention on glocks (one per inode) and maybe

>> because there is a process of processes writing a file or directory

>> which is in use (either read-only or writable) by other processes.

>>

>> If you are using php, then you might have to strace it to find out what

>> it is really doing,

>

> Ok, we will try to strace the D processes and post the results. Hope

> we find something!!

>

>>

>> Steve.

>>

>>> --

>>>

>>> Emilio Arjona.

>>>

>>> --

>>> Linux-cluster mailing list

>>> Linux-cluster@xxxxxxxxxx

>>> https://www.redhat.com/mailman/listinfo/linux-cluster

>>

>>

>> --

>> Linux-cluster mailing list

>> Linux-cluster@xxxxxxxxxx

>> https://www.redhat.com/mailman/listinfo/linux-cluster

>>

>

>

>

> --

> Emilio Arjona.

>

> --

> Linux-cluster mailing list

> Linux-cluster@xxxxxxxxxx

> https://www.redhat.com/mailman/listinfo/linux-cluster

>

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
*******************************************

Emilio Arjona Heredia
Centro de Enseñanzas Virtuales de la Universidad de Granada
C/ Real de Cartuja 36-38
http://cevug.ugr.es
Tlfno.: 958-241000 ext. 20206
*******************************************

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster