We don't want to update the server because it's in production. We will plan a system update in summer when system's load is low.
In the last incidents there is a new process involved: [delete_workqueu]. Now, it is usually the initiator of the D-state processes lockout. I have been looking for information about this process but couldn't find out anything.
Any idea?
Regards :)
2010/4/9 Ricardo Argüello <ricardo@xxxxxxxxxxxxxxxxx>
Looks like this bug:
GFS2 - probably lost glock call back
https://bugzilla.redhat.com/show_bug.cgi?id=498976
This is fixed in the kernel included in RHEL 5.5.
Do a "yum update" to fix it.
Ricardo Arguello
On Tue, Mar 2, 2010 at 6:10 AM, Emilio Arjona <emilio.ah@xxxxxxxxx> wrote:
> Thanks for your response, Steve.
>
> 2010/3/2 Steven Whitehouse <swhiteho@xxxxxxxxxx>:
>> Hi,
>>
>> On Fri, 2010-02-26 at 16:52 +0100, Emilio Arjona wrote:
>>> Hi,
>>>
>>> we are experiencing some problems commented in an old thread:
>>>
>>> http://www.mail-archive.com/linux-cluster@xxxxxxxxxx/msg07091.html
>>>
>>> We have 3 clustered servers under Red Hat 5.4 accessing a GFS2 resource.
>>>
>>> fstab options:
>>> /dev/vg_cluster/lv_cluster /opt/datacluster gfs2
>>> defaults,noatime,nodiratime,noquota 0 0
>>>
>>> GFS options:
>>> plock_rate_limit="0"
>>> plock_ownership=1
>>>
>>> httpd processes run into D status sometimes and the only solution is
>>> hard reset the affected server.
>>>
>>> Can anyone give me some hints to diagnose the problem?
>>>
>>> Thanks :)
>>>
>> Can you give me a rough idea of what the actual workload is and how it
>> is distributed amoung the director(y/ies) ?
>
> We had problems with php sessions in the past but we fixed it by
> configuring php to store the sessions in the database instead of in
> the GFS filesystem. Now, we're having problems with files and
> directories in the "data" folder of Moodle LMS.
>
> "lsof -p" returned a i/o operation over the same folder in 2/3 nodes,
> we did a hard reset of these nodes but some hours after the CPU load
> grew up again, specially in the node that wasn't rebooted. We decided
> to reboot (vía ssh) this node, then the CPU load went down to normal
> values in all nodes.
>
> I don't think the system's load is high enough to produce concurrent
> access problems. It's more likely to be some misconfiguration, in
> fact, we changed some GFS2 options to non default values to increase
> performance (http://www.linuxdynasty.org/howto-increase-gfs2-performance-in-a-cluster.html).
>
>>
>> This is often down to contention on glocks (one per inode) and maybe
>> because there is a process of processes writing a file or directory
>> which is in use (either read-only or writable) by other processes.
>>
>> If you are using php, then you might have to strace it to find out what
>> it is really doing,
>
> Ok, we will try to strace the D processes and post the results. Hope
> we find something!!
>
>>
>> Steve.
>>
>>> --
>>>
>>> Emilio Arjona.
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster@xxxxxxxxxx
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster@xxxxxxxxxx
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>
> --
> Emilio Arjona.
>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
*******************************************
Emilio Arjona Heredia
Centro de Enseñanzas Virtuales de la Universidad de Granada
C/ Real de Cartuja 36-38
http://cevug.ugr.es
Tlfno.: 958-241000 ext. 20206
*******************************************
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster