Hi, I observed quite the same problem at some time. There's the bugzilla entry I opened: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=228916 Mark On Wednesday 04 April 2007 15:18:18 Peter Sopko wrote: > Hi, > > thanks for your reply Bryn. > > The output of the ps command you suggested (i've ommited the standard > system processes) : > > [root@mail1 subsys]# ps ax -ocomm,pid,state,wchan |more > COMMAND PID S WCHAN > ccsd 2258 S - > cman_comms 2310 S cluster_kthread > cman_serviced 2312 S serviced > cman_memb 2311 S membership_kthread > cman_hbeat 2315 S hello_kthread > fenced 2336 S rt_sigsuspend > dlm_astd 2354 S dlm_astd > dlm_recvd 2355 S dlm_recvd > dlm_sendd 2356 S dlm_sendd > lock_dlm1 2358 S dlm_async > lock_dlm2 2359 S dlm_async > gfs_scand 2360 S - > gfs_glockd 2361 S gfs_glockd > gfs_recoverd 2362 S - > gfs_logd 2363 S - > gfs_quotad 2364 D glock_wait_internal > gfs_inoded 2365 D dlm_lock_sync > syslogd 2374 S - > klogd 2394 S syslog > heartbeat 2503 S - > courierlogger 2526 S pipe_wait > authdaemond 2527 S - > authdaemond 2551 S - > authdaemond 2552 S - > authdaemond 2553 S - > authdaemond 2554 S - > authdaemond 2555 S - > heartbeat 2586 S pipe_wait > heartbeat 2587 S - > heartbeat 2588 S - > heartbeat 2589 S - > heartbeat 2590 S - > acpid 2595 S - > ipfail 2608 S - > nod32d 2609 S - > nod32smtp 2618 S - > sshd 2627 S - > ntpd 2642 S - > courierlogger 2654 S pipe_wait > couriertcpd 2655 S - > courierlogger 2661 S pipe_wait > couriertcpd 2662 S - > courierlogger 2667 S pipe_wait > couriertcpd 2668 S wait > courierlogger 2673 S pipe_wait > couriertcpd 2674 S - > master 2815 S - > master 3024 S - > httpd 3039 S - > crond 3048 S - > rhnsd 3067 S - > mingetty 3074 S - > mingetty 3075 S - > mingetty 3076 S - > mingetty 3077 S - > mingetty 3078 S - > mingetty 3079 S - > ntpd 3888 S rt_sigsuspend > tlsmgr 4544 S - > tlsmgr 1585 S - > anvil 1699 S - > spamd 29941 S - > httpd 15674 D glock_wait_internal > httpd 15675 D glock_wait_internal > httpd 15676 D glock_wait_internal > httpd 15677 D glock_wait_internal > httpd 15678 D glock_wait_internal > httpd 15679 D glock_wait_internal > httpd 15680 D glock_wait_internal > httpd 15681 D glock_wait_internal > httpd 30808 D glock_wait_internal > httpd 30809 D glock_wait_internal > httpd 30810 D glock_wait_internal > httpd 30825 D glock_wait_internal > httpd 30827 D glock_wait_internal > httpd 30828 D glock_wait_internal > httpd 30829 D glock_wait_internal > httpd 30830 D glock_wait_internal > httpd 30831 D glock_wait_internal > httpd 30832 D glock_wait_internal > httpd 30835 D glock_wait_internal > httpd 30840 D glock_wait_internal > spamd 17341 S - > proxymap 24868 S - > proxymap 27542 S - > mysqld_safe 30617 S wait > mysqld 30650 S - > trivial-rewrite 30735 S - > proxymap 30742 S - > sshd 517 S - > sshd 519 S - > bash 520 S wait > su 740 S wait > bash 741 S - > imapd 15018 D lock_on_glock > virtual 15699 D lock_on_glock > trivial-rewrite 15918 S - > proxymap 15922 S - > virtual 15943 D lock_on_glock > virtual 15952 D lock_on_glock > virtual 15965 D lock_on_glock > pop3d 15966 D lock_on_glock > pop3d 15967 D lock_on_glock > virtual 15968 D lock_on_glock > pop3d 15971 D lock_on_glock > pop3d 15983 D lock_on_glock > virtual 16046 D lock_on_glock > pop3d 16049 D lock_on_glock > pop3d 16053 D lock_on_glock > pop3d 16068 D glock_wait_internal > pop3d 16074 D glock_wait_internal > virtual 16077 D lock_on_glock > spamd 16112 S - > virtual 16129 D lock_on_glock > virtual 16133 D lock_on_glock > pop3d 16143 D glock_wait_internal > virtual 16153 D lock_on_glock > virtual 16160 D glock_wait_internal > virtual 16163 D lock_on_glock > pop3d 16164 D glock_wait_internal > virtual 16179 D lock_on_glock > pop3d 16183 D glock_wait_internal > pop3d 16186 D glock_wait_internal > pop3d 16187 D glock_wait_internal > virtual 16191 D lock_on_glock > pop3d 16192 D lock_on_glock > virtual 16194 D lock_on_glock > pop3d 16202 D glock_wait_internal > virtual 16207 D lock_on_glock > virtual 16217 D lock_on_glock > virtual 16222 D lock_on_glock > .... > smtp 21150 S - > smtp 21162 S flock_lock_file_wait > cleanup 21181 S flock_lock_file_wait > smtpd 21213 S - > spamfilter.sh 21224 S wait > cat 21225 S pipe_wait > spamfilter.sh 21226 D - > spamfilter.sh 21229 S wait > pipe 21230 S - > cat 21231 S pipe_wait > spamfilter.sh 21232 D - > spamfilter.sh 21235 S wait > cat 21236 S pipe_wait > spamfilter.sh 21237 D - > spamfilter.sh 21239 S wait > spamfilter.sh 21240 S wait > cat 21242 S pipe_wait > spamfilter.sh 21243 D - > virtual 21244 D lock_on_glock > cat 21245 S pipe_wait > spamfilter.sh 21246 D - > spamfilter.sh 21249 S wait > cat 21250 S pipe_wait > spamfilter.sh 21251 D - > spamfilter.sh 21252 S wait > cat 21253 S pipe_wait > spamfilter.sh 21254 D - > spamfilter.sh 21257 S wait > cat 21258 S pipe_wait > spamfilter.sh 21259 D - > spamfilter.sh 21261 S wait > spamfilter.sh 21262 S wait > spamfilter.sh 21263 S wait > cat 21264 S pipe_wait > spamfilter.sh 21265 D - > spamfilter.sh 21267 D - > cat 21268 S pipe_wait > spamfilter.sh 21269 D - > spamfilter.sh 21273 S wait > ... > etc.... > > > The sysrq-t output is to be found on this url - > http://www.backbone.sk/sysrq.tar. It's 400k in size, so I have chosen not > to attach it as in here. There are two files in this .tar - one was taken > 15:04 and the other one on 15:08. > > Again I will be very thankful for any help. > > Peter Sopko, IT Security Consultant > Tempest a.s. > > > -----Original Message----- > From: linux-cluster-bounces@xxxxxxxxxx > [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bryn M. Reeves > Sent: Wednesday, April 04, 2007 2:45 PM > To: linux clustering > Subject: Re: problem with deadlocked processes (D) > > Peter Sopko wrote: > > Hi, > > > > today a strange thing occurred - on both of our cluster nodes a lot of > > processes suddenly started to become locked in the D state (i/o lock). > > This > > > thing has already happened once before (six months ago), but a simple > > reboot > > > helped to solve this issue. But as it appeared again, I don't want to > > solve > > > it this way again, I would like to find the reason why this is happening, > > but have no idea where to start. In /var/log/messages there is nothing > > unusual, the only thing is that some directories are unremoveable and a > > lot > > > of processes locked. > > For problems where processes are getting stuck in D state it's usually > helpful to get sysrq-t data to see where the threads are stuck. Grab two > sets of data a few seconds apart so that you can see if things are > really stuck or just making slow progress. > > You can also get some information from the wchan data exposed in /proc - > it's easiest to view with ps: > > $ ps ax -ocomm,pid,state,wchan > COMMAND PID S WCHAN > vim 22322 S - > bash 22471 S - > man 22817 S wait > sh 22820 S wait > sh 22821 S wait > less 22826 S - > bash 22839 S wait > screen 23435 S pause > [...] > > Regards, > Bryn. > > > > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Gruss / Regards, Dipl.-Ing. Mark Hlawatschek http://www.atix.de/ http://www.open-sharedroot.org/ ** ATIX - Ges. fuer Informationstechnologie und Consulting mbH Einsteinstr. 10 - 85716 Unterschleissheim - Germany -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster