in a webserver cluster httpd run into D state sometimes. I have to restart the node or even the whole cluster if there are more than one node locked. I'm using REDHAT 5.4 and HP hardware.
Regards,
--
*******************************************
Emilio Arjona Heredia
Centro de Enseñanzas Virtuales de la Universidad de Granada
C/ Real de Cartuja 36-38
http://cevug.ugr.es
Tlfno.: 958-241000 ext. 20206
*******************************************
2011/1/4 Paras pradhan <pradhanparas@xxxxxxxxx>
I had the same problem. it locked the whole gfs cluster and had to
reboot the node. after reboot all is fine now but still trying to find
out what has caused it.
Paras
On Monday, January 3, 2011, InterNetworX | Hostmaster
<hostmaster@xxxxxxx> wrote:
> Hello,
>
> we are using GFS2 but sometimes there are processes hanging in D state:
>
> # ps axl | grep D
> F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
> 0 0 14220 14219 20 0 19624 1916 - Ds ? 0:00
> /usr/lib/postfix/master -t
> 0 0 14555 14498 20 0 16608 1716 - D+
> /mnt/storage/openvz/root/129/dev/pts/0 0:00 apt-get install less
> 0 0 15068 15067 19 -1 36844 2156 - D<s ? 0:00
> /usr/lib/postfix/master -t
> 0 0 16603 16602 19 -1 36844 2156 - D<s ? 0:00
> /usr/lib/postfix/master -t
> 4 101 19534 13238 19 -1 33132 2984 - D< ? 0:00
> smtpd -n smtp -t inet -u -c
> 4 101 19542 13238 19 -1 33116 2976 - D< ? 0:00
> smtpd -n smtp -t inet -u -c
> 0 0 19735 13068 20 0 7548 880 - S+ pts/0 0:00 grep D
>
> dmesg shows this message many times:
>
> [11142.334229] INFO: task master:14220 blocked for more than 120 seconds.
> [11142.334266] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [11142.334310] master D ffff88032b644800 0 14220 14219
> 0x00000000
> [11142.334315] ffff88062dd40000 0000000000000086 0000000000000000
> ffffffffa02628d9
> [11142.334318] ffff88017a517ef8 000000000000fa40 ffff88017a517fd8
> 0000000000016940
> [11142.334322] 0000000000016940 ffff88032b644800 ffff88032b644af8
> 0000000b7a517cd8
> [11142.334325] Call Trace:
> [11142.334340] [<ffffffffa02628d9>] ? gfs2_glock_put+0xf9/0x118 [gfs2]
> [11142.334347] [<ffffffffa0261db0>] ? gfs2_glock_holder_wait+0x0/0xd [gfs2]
> [11142.334353] [<ffffffffa0261db9>] ? gfs2_glock_holder_wait+0x9/0xd [gfs2]
> [11142.334358] [<ffffffff812e9897>] ? __wait_on_bit+0x41/0x70
> [11142.334363] [<ffffffffa0261db0>] ? gfs2_glock_holder_wait+0x0/0xd [gfs2]
> [11142.334367] [<ffffffff812e9931>] ? out_of_line_wait_on_bit+0x6b/0x77
> [11142.334370] [<ffffffff81066808>] ? wake_bit_function+0x0/0x23
> [11142.334376] [<ffffffffa0261d9e>] ? gfs2_glock_wait+0x23/0x28 [gfs2]
> [11142.334383] [<ffffffffa026b2b0>] ? gfs2_flock+0x17c/0x1f9 [gfs2]
> [11142.334386] [<ffffffff810e735d>] ? virt_to_head_page+0x9/0x2a
> [11142.334389] [<ffffffff810e743e>] ? ub_slab_ptr+0x22/0x65
> [11142.334393] [<ffffffff8112221b>] ? sys_flock+0xff/0x12a
> [11142.334396] [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
>
> Any idea what is going wrong? Do you need any more informations?
>
> Mario
>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
*******************************************
Emilio Arjona Heredia
Centro de Enseñanzas Virtuales de la Universidad de Granada
C/ Real de Cartuja 36-38
http://cevug.ugr.es
Tlfno.: 958-241000 ext. 20206
*******************************************
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster