Hi, today a strange thing occurred - on both of our cluster nodes a lot of processes suddenly started to become locked in the D state (i/o lock). This thing has already happened once before (six months ago), but a simple reboot helped to solve this issue. But as it appeared again, I don't want to solve it this way again, I would like to find the reason why this is happening, but have no idea where to start. In /var/log/messages there is nothing unusual, the only thing is that some directories are unremoveable and a lot of processes locked. Some infos about our configuration : The storage array is an MSA 1000. There are two cluster nodes, that are connected with each other and both are connected to the storage array. The only use of the cluster is web/mail server - apache, courier-imap, postfix, spamassassin, clamav, mysql, ... The last time this thing happened only courier-imap processes (imapd, pop3d) were locked, today it is also apaches httpd. The current state is 261 processes out of 472 are locked in the D state. Here are some examples taken using ps afx (the XXXXX are just filtred e-mail addresses) : 3024 ? Ss 7:14 /usr/libexec/postfix/master 4544 ? S 0:17 \_ tlsmgr -l -t unix -u 15922 ? S 0:00 \_ proxymap -t unix -u 15943 ? D 0:00 \_ virtual -t unix 16129 ? D 0:00 \_ virtual -t unix 16153 ? D 0:00 \_ virtual -t unix 16261 ? S 0:00 \_ proxymap -t unix -u 16262 ? D 0:00 \_ virtual -t unix 16269 ? D 0:00 \_ virtual -t unix 16271 ? D 0:00 \_ virtual -t unix 16424 ? D 0:00 \_ virtual -t unix 19138 ? D 0:00 \_ virtual -t unix 19147 ? D 0:00 \_ virtual -t unix 19153 ? D 0:00 \_ virtual -t unix 19205 ? D 0:00 \_ virtual -t unix 19835 ? S 0:00 \_ pickup -l -t fifo -u 19919 ? S 0:00 \_ qmgr -l -t fifo -u 19920 ? S 0:00 \_ pipe -n filter -t unix flags=Rq user=filter argv=/data/spam/spamfilter.sh -f ${sender} --{recipient} 20346 ? Ss 0:00 | \_ /bin/sh /data/spam/spamfilter.sh -f XXXXX -- XXXXX 20353 ? S 0:00 | \_ cat 20354 ? D 0:00 | \_ /bin/sh /data/spam/spamfilter.sh -f XXXXX -- XXXXX 3039 ? Ss 0:03 /usr/sbin/httpd 15674 ? D 0:37 \_ /usr/sbin/httpd 15675 ? D 0:32 \_ /usr/sbin/httpd 15676 ? D 0:30 \_ /usr/sbin/httpd 15677 ? D 0:34 \_ /usr/sbin/httpd 15678 ? D 0:31 \_ /usr/sbin/httpd 15679 ? S 0:33 \_ /usr/sbin/httpd 15680 ? D 0:34 \_ /usr/sbin/httpd 15681 ? D 0:34 \_ /usr/sbin/httpd 30808 ? D 0:15 \_ /usr/sbin/httpd 30809 ? D 0:15 \_ /usr/sbin/httpd 30810 ? D 0:13 \_ /usr/sbin/httpd 30825 ? D 0:16 \_ /usr/sbin/httpd 30827 ? D 0:15 \_ /usr/sbin/httpd 30828 ? D 0:17 \_ /usr/sbin/httpd 30829 ? D 0:14 \_ /usr/sbin/httpd 30830 ? D 0:14 \_ /usr/sbin/httpd 30831 ? D 0:17 \_ /usr/sbin/httpd 30832 ? D 0:12 \_ /usr/sbin/httpd 30835 ? D 0:15 \_ /usr/sbin/httpd 30840 ? D 0:12 \_ /usr/sbin/httpd 20441 ? D 0:00 \_ /usr/sbin/httpd 20500 ? D 0:00 \_ /usr/sbin/httpd 20501 ? S 0:00 \_ /usr/sbin/httpd any idea where to start with debuging and looking for the reason this is happening ? I find it quit weird, that for more than 6 month it is ok a now in a sudden it starts doing this..... Any help would be greatly appreciated. Thanks Peter Sopko, IT Security Consultant Tempest a.s. Slovak Republic -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster