Hi, thanks for your reply Bryn. The output of the ps command you suggested (i've ommited the standard system processes) : [root@mail1 subsys]# ps ax -ocomm,pid,state,wchan |more COMMAND PID S WCHAN ccsd 2258 S - cman_comms 2310 S cluster_kthread cman_serviced 2312 S serviced cman_memb 2311 S membership_kthread cman_hbeat 2315 S hello_kthread fenced 2336 S rt_sigsuspend dlm_astd 2354 S dlm_astd dlm_recvd 2355 S dlm_recvd dlm_sendd 2356 S dlm_sendd lock_dlm1 2358 S dlm_async lock_dlm2 2359 S dlm_async gfs_scand 2360 S - gfs_glockd 2361 S gfs_glockd gfs_recoverd 2362 S - gfs_logd 2363 S - gfs_quotad 2364 D glock_wait_internal gfs_inoded 2365 D dlm_lock_sync syslogd 2374 S - klogd 2394 S syslog heartbeat 2503 S - courierlogger 2526 S pipe_wait authdaemond 2527 S - authdaemond 2551 S - authdaemond 2552 S - authdaemond 2553 S - authdaemond 2554 S - authdaemond 2555 S - heartbeat 2586 S pipe_wait heartbeat 2587 S - heartbeat 2588 S - heartbeat 2589 S - heartbeat 2590 S - acpid 2595 S - ipfail 2608 S - nod32d 2609 S - nod32smtp 2618 S - sshd 2627 S - ntpd 2642 S - courierlogger 2654 S pipe_wait couriertcpd 2655 S - courierlogger 2661 S pipe_wait couriertcpd 2662 S - courierlogger 2667 S pipe_wait couriertcpd 2668 S wait courierlogger 2673 S pipe_wait couriertcpd 2674 S - master 2815 S - master 3024 S - httpd 3039 S - crond 3048 S - rhnsd 3067 S - mingetty 3074 S - mingetty 3075 S - mingetty 3076 S - mingetty 3077 S - mingetty 3078 S - mingetty 3079 S - ntpd 3888 S rt_sigsuspend tlsmgr 4544 S - tlsmgr 1585 S - anvil 1699 S - spamd 29941 S - httpd 15674 D glock_wait_internal httpd 15675 D glock_wait_internal httpd 15676 D glock_wait_internal httpd 15677 D glock_wait_internal httpd 15678 D glock_wait_internal httpd 15679 D glock_wait_internal httpd 15680 D glock_wait_internal httpd 15681 D glock_wait_internal httpd 30808 D glock_wait_internal httpd 30809 D glock_wait_internal httpd 30810 D glock_wait_internal httpd 30825 D glock_wait_internal httpd 30827 D glock_wait_internal httpd 30828 D glock_wait_internal httpd 30829 D glock_wait_internal httpd 30830 D glock_wait_internal httpd 30831 D glock_wait_internal httpd 30832 D glock_wait_internal httpd 30835 D glock_wait_internal httpd 30840 D glock_wait_internal spamd 17341 S - proxymap 24868 S - proxymap 27542 S - mysqld_safe 30617 S wait mysqld 30650 S - trivial-rewrite 30735 S - proxymap 30742 S - sshd 517 S - sshd 519 S - bash 520 S wait su 740 S wait bash 741 S - imapd 15018 D lock_on_glock virtual 15699 D lock_on_glock trivial-rewrite 15918 S - proxymap 15922 S - virtual 15943 D lock_on_glock virtual 15952 D lock_on_glock virtual 15965 D lock_on_glock pop3d 15966 D lock_on_glock pop3d 15967 D lock_on_glock virtual 15968 D lock_on_glock pop3d 15971 D lock_on_glock pop3d 15983 D lock_on_glock virtual 16046 D lock_on_glock pop3d 16049 D lock_on_glock pop3d 16053 D lock_on_glock pop3d 16068 D glock_wait_internal pop3d 16074 D glock_wait_internal virtual 16077 D lock_on_glock spamd 16112 S - virtual 16129 D lock_on_glock virtual 16133 D lock_on_glock pop3d 16143 D glock_wait_internal virtual 16153 D lock_on_glock virtual 16160 D glock_wait_internal virtual 16163 D lock_on_glock pop3d 16164 D glock_wait_internal virtual 16179 D lock_on_glock pop3d 16183 D glock_wait_internal pop3d 16186 D glock_wait_internal pop3d 16187 D glock_wait_internal virtual 16191 D lock_on_glock pop3d 16192 D lock_on_glock virtual 16194 D lock_on_glock pop3d 16202 D glock_wait_internal virtual 16207 D lock_on_glock virtual 16217 D lock_on_glock virtual 16222 D lock_on_glock .... smtp 21150 S - smtp 21162 S flock_lock_file_wait cleanup 21181 S flock_lock_file_wait smtpd 21213 S - spamfilter.sh 21224 S wait cat 21225 S pipe_wait spamfilter.sh 21226 D - spamfilter.sh 21229 S wait pipe 21230 S - cat 21231 S pipe_wait spamfilter.sh 21232 D - spamfilter.sh 21235 S wait cat 21236 S pipe_wait spamfilter.sh 21237 D - spamfilter.sh 21239 S wait spamfilter.sh 21240 S wait cat 21242 S pipe_wait spamfilter.sh 21243 D - virtual 21244 D lock_on_glock cat 21245 S pipe_wait spamfilter.sh 21246 D - spamfilter.sh 21249 S wait cat 21250 S pipe_wait spamfilter.sh 21251 D - spamfilter.sh 21252 S wait cat 21253 S pipe_wait spamfilter.sh 21254 D - spamfilter.sh 21257 S wait cat 21258 S pipe_wait spamfilter.sh 21259 D - spamfilter.sh 21261 S wait spamfilter.sh 21262 S wait spamfilter.sh 21263 S wait cat 21264 S pipe_wait spamfilter.sh 21265 D - spamfilter.sh 21267 D - cat 21268 S pipe_wait spamfilter.sh 21269 D - spamfilter.sh 21273 S wait ... etc.... The sysrq-t output is to be found on this url - http://www.backbone.sk/sysrq.tar. It's 400k in size, so I have chosen not to attach it as in here. There are two files in this .tar - one was taken 15:04 and the other one on 15:08. Again I will be very thankful for any help. Peter Sopko, IT Security Consultant Tempest a.s. -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Bryn M. Reeves Sent: Wednesday, April 04, 2007 2:45 PM To: linux clustering Subject: Re: problem with deadlocked processes (D) -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Peter Sopko wrote: > Hi, > > today a strange thing occurred - on both of our cluster nodes a lot of > processes suddenly started to become locked in the D state (i/o lock). This > thing has already happened once before (six months ago), but a simple reboot > helped to solve this issue. But as it appeared again, I don't want to solve > it this way again, I would like to find the reason why this is happening, > but have no idea where to start. In /var/log/messages there is nothing > unusual, the only thing is that some directories are unremoveable and a lot > of processes locked. For problems where processes are getting stuck in D state it's usually helpful to get sysrq-t data to see where the threads are stuck. Grab two sets of data a few seconds apart so that you can see if things are really stuck or just making slow progress. You can also get some information from the wchan data exposed in /proc - it's easiest to view with ps: $ ps ax -ocomm,pid,state,wchan COMMAND PID S WCHAN vim 22322 S - bash 22471 S - man 22817 S wait sh 22820 S wait sh 22821 S wait less 22826 S - bash 22839 S wait screen 23435 S pause [...] Regards, Bryn. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFGE5226YSQoMYUY94RAgm0AKDdPg/mcTHilSwMpd6+Meno2zBLtACgt+/j TT3MsBrg6/gpdBdPDYMEp5Q= =ADyt -----END PGP SIGNATURE----- -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster