On Tue, Sep 08, 2009 at 05:37:09AM -0700, Anand Avati wrote: > >> > I doubt that this can be a real solution. My guess is that glusterfsd runs > >> > into some race condition where it locks itself up completely. > >> > It is not funny to debug something the like on a production setup. Best would > >> > be to have debugging output sent from the servers' glusterfsd directly to a > >> > client to save the logs. I would not count on syslog in this case, if it > >> > survives one could use a serial console for syslog output though. > > I'm going to iterate through this yet again at the risk of frustrating > you. glusterfsd (on the server side) is yet another process running > only system calls. If glusterfsd has a race condition and locks itself > up, then it locks _only its own process_ up. What you are having is a > frozen system. There is no way glusterfsd can lock up your system > through just VFS system calls, even if it wanted to, intentionally. It > is a pure user space process and has no power to lock up the system. > The worst glusterfsd can do to your system is deadlock its own process > resulting in a glusterfs fuse mountpoint hang, or segfault and result > in a core dump. It appears OP has no core-dump. It appears OP has no gluster logs. It appears OP cannot log in/ ssh to observe results, but instead must cold boot. Debugging opportunities are getting slim. Are there kernel instrumention utils that OP can use, to determine one or more of: - file descriptors running out - thread deadlock condition occurring - some other kernel level subsystem failure - eg networking, fs, scheduler/memory ??? I have been watching closely. I am potential gluster user, monitoring this situation - thanks to all parties for ongoing analysis and patience in this case. Gluster appears to be a new technology, with excellent potential. Regards Zenaan -- Homepage: www.SoulSound.net -- Free Australia: www.UPMART.org Please respect the confidentiality of this email as sensibly warranted.