German Staltari wrote:
Hi, we have a 6 node cluster with FC4, kernel 2.6.16 and the CVS
STABLE branch of the cluster software. Sometimes, some processes
(courier imap) hangs in D state. When I execute "ls -la" in the "tmp"
directory (the directory is always the same, the same mailbox) of the
mailbox that it's triyng to access the process, the answer is really
slow and this is the output:
?--------- ? ? ? ? ?
1151074448.M345358P6861_courierlock.qmail-be-04
?--------- ? ? ? ? ?
1151074497.M326691P7647_courierlock.qmail-be-04
?--------- ? ? ? ? ?
1151074534.M524707P2198_courierlock.qmail-be-05
-rw-r--r-- 1 mailuser mailuser 17 Jun 23 12:07
1151074538.M785749P13408_courierlock.qmail-be-03
-rw-r--r-- 1 mailuser mailuser 16 Jun 23 12:09
1151074588.M917441P3132_courierlock.qmail-be-05
-rw-r--r-- 1 mailuser mailuser 16 Jun 23 12:09
1151074593.M62901P3189_courierlock.qmail-be-05
?--------- ? ? ? ? ?
1151074649.M845223P5214_courierlock.qmail-be-02
-rw-r--r-- 1 mailuser mailuser 17 Jun 23 12:09
1151074656.M448306P28724_courierlock.qmail-be-06
-rw-r--r-- 1 mailuser mailuser 16 Jun 23 12:07
1151074657.M188653P5302_courierlock.qmail-be-02
?--------- ? ? ? ? ?
1151074679.M821433P4979_courierlock.qmail-be-05
-rw-r--r-- 1 mailuser mailuser 16 Jun 23 12:07
1151074690.M360083P5741_courierlock.qmail-be-02
-rw-r--r-- 1 mailuser mailuser 17 Jun 23 12:07
1151074701.M709923P29422_courierlock.qmail-be-06
-rw-r--r-- 1 mailuser mailuser 16 Jun 23 12:07
1151074716.M544858P6016_courierlock.qmail-be-02
-rw-r--r-- 1 mailuser mailuser 16 Jun 23 12:07
1151074731.M21587P6179_courierlock.qmail-be-02
?--------- ? ? ? ? ?
1151074804.M241436P7410_courierlock.qmail-be-02
?--------- ? ? ? ? ?
1151074831.M678238P17302_courierlock.qmail-be-03
?--------- ? ? ? ? ?
1151074917.M42708P8494_courierlock.qmail-be-05
-rw-r--r-- 1 mailuser mailuser 17 Jun 23 12:08
1151074918.M541477P14716_courierlock.qmail-be-04
-rw-r--r-- 1 mailuser mailuser 17 Jun 23 12:08
1151074946.M520653P15248_courierlock.qmail-be-04
?--------- ? ? ? ? ?
1151075037.M234721P11020_courierlock.qmail-be-02
?--------- ? ? ? ? ?
1151075065.M951224P8598_courierlock.qmail-be-01
-rw-r--r-- 1 mailuser mailuser 17 Jun 23 12:09
1151075082.M788480P11712_courierlock.qmail-be-02
-rw-r--r-- 1 mailuser mailuser 17 Jun 23 12:09
1151075186.M911867P18565_courierlock.qmail-be-04
-rw-r--r-- 1 mailuser mailuser 17 Jun 23 12:08
1151075210.M366861P13891_courierlock.qmail-be-02
-rw-r--r-- 1 mailuser mailuser 17 Jun 23 12:09
1151075217.M850817P13366_courierlock.qmail-be-05
?--------- ? ? ? ? ?
1151075252.M599978P32483_imapuid_4.qmail-be-05
It seems like a lock problem, but not sure. Is there any other tool
that I can use to debug this?
Thanks
German
--
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
Hi German,
I suspect you are right: The question marks in ls -l leads me to believe
there
might be a problem somewhere regarding the locking of the files. My
theory is this:
ls -l calls a kernel stat function to get file statistics. The stat
tries to acquire an internal
lock (glock), but can't, so it displays what you see instead of valid
values.
Perhaps courier imap is locking files, then hanging, and the process is
somehow
hanging around with the lock intact, or else killed abnormally where the
lock is not released.
Do you have any suggestions how we can recreate this problem in our lab?
Regards,
Bob Peterson
Red Hat Cluster Suite
--
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster