Sebastien, On Tuesday 07 March 2006 12:35, Sébastien DIDIER wrote: > 2006/3/7, Marc Grimme <grimme@xxxxxxx>: > > Hi, > > to debug you could use strace. E.g. executing strace -p 14970 will > > probably show you that the process is waiting for a lock. As the ps > > already does. My first guess would be, that you use apache with php and > > sessions. > > Thanks. But strace doesnt output anything and became Ctrl-C imune. It > needs a sigkill to exit and the traced process stays in T state. I > seems that it doesnt manage to get last system call where the process > is in D state. Hmm, sounds like I've heard that already. If you trace the root httpd with -f and -t and lookout for great timeslices you'll propably find processes waiting for locks. The D state is a good indicator (ps ax | grep " D " and look at the pids). Do the pids of the D processes change from time to time or do they stay the same pids? > > > If so, the phplib uses flocks for locking the session-ids. Normally it > > happens that one process locks a session. If another process comes along > > to get an flock on that session it has to wait until the further flock is > > closed. It very often happens that the other process gets that flock when > > the client and session are not available any more. Then the flock is held > > until the apache process timesout. > > I don't think it is session related because I store sessions file > outside the GFS mount point (/tmp) and I run a load balancer based > upon the source adress (to always send requests to the same server and > then keep sessions) Yes, I agree. Sessions get lost if the the node fails, right? > > But, we are using mysql query caching (with some libraries like AdoDb) > inside the GFS mount point. Do you think it could be the cache files > which are dead-locked ? It depends on how those files are locked and how and when the locks are set and released. If a lock is set at apache-child forktime and released at process terminate time, then yes that could happen. If only accesses to data of those files are protected with flocks then it should perform quite well. Is that query caching part of perl-adodb or is it implemented by yourselves? Have a look and play with strace and watch out for great times and the syscalls concerned with that. I would expect you ending up with flock-timeouts. Hope that helps, regards Marc. > > > We have made a patch for a better locking with php which you can find on > > http:/www.open-sharedroot.org in the downloads section. > > Hope that helps > > Regards Marc. > > > > On Tuesday 07 March 2006 11:50, Sébastien DIDIER wrote: > > > Hi, > > > > > > I'm running a two-nodes GFS cluster which hosts web sites. The GFS > > > partition is over a Iscsi device and by now, i'm using manual fencing. > > > > > > Today, I got 5 httpd process on both nodes which got stuck in IO > > > blocking state. I suspected a GFS filesystem corruption but I haven't > > > got any output from the kernel. I ran a fsck two days ago after a > > > power chute. > > > > > > Here's the wait state of the process. (idem for the other node) > > > > > > # ps -o pid,tt,user,fname,wchan -C apache > > > PID TT USER COMMAND WCHAN > > > 4426 ? root apache - > > > 14970 ? www-data apache glock_wait_internal > > > 15103 ? www-data apache glock_wait_internal > > > 16780 ? www-data apache glock_wait_internal > > > 16959 ? www-data apache glock_wait_internal > > > 14936 ? www-data apache finish_stop > > > 12859 ? www-data apache - > > > 13005 ? www-data apache - > > > 13311 ? www-data apache semtimedop > > > 13390 ? www-data apache semtimedop > > > > > > How can I debug further this problem ? And how can I bring back home > > > my httpd processes without a reboot ? > > > > > > Many thanks for your help. > > > > > > Regards, > > > Sébastien DIDIER > > > > > > -- > > > > > > Linux-cluster@xxxxxxxxxx > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > Gruss / Regards, > > > > Marc Grimme > > Phone: +49-89 121 409-54 > > http://www.atix.de/ http://www.open-sharedroot.org/ > > > > ** > > ATIX - Ges. fuer Informationstechnologie und Consulting mbH > > Einsteinstr. 10 - 85716 Unterschleissheim - Germany > > -- > > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Gruss / Regards, Marc Grimme Phone: +49-89 121 409-54 http://www.atix.de/ http://www.open-sharedroot.org/ ** ATIX - Ges. fuer Informationstechnologie und Consulting mbH Einsteinstr. 10 - 85716 Unterschleissheim - Germany -- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster