>> > I doubt that this can be a real solution. My guess is that glusterfsd runs >> > into some race condition where it locks itself up completely. >> > It is not funny to debug something the like on a production setup. Best would >> > be to have debugging output sent from the servers' glusterfsd directly to a >> > client to save the logs. I would not count on syslog in this case, if it >> > survives one could use a serial console for syslog output though. I'm going to iterate through this yet again at the risk of frustrating you. glusterfsd (on the server side) is yet another process running only system calls. If glusterfsd has a race condition and locks itself up, then it locks _only its own process_ up. What you are having is a frozen system. There is no way glusterfsd can lock up your system through just VFS system calls, even if it wanted to, intentionally. It is a pure user space process and has no power to lock up the system. The worst glusterfsd can do to your system is deadlock its own process resulting in a glusterfs fuse mountpoint hang, or segfault and result in a core dump. Please consult system/kernel programmers you trust. Or ask on the kernel-devel mailing list. The system freeze you are facing is not something which can be caused by _any_ user space application. The correlation you see that the freeze happens only when glusterfsd is running does NOT make glusterfsd _responsible_ for it. I'm not sure if you understand how user processes and kernels work and interact with each other. Think of this almost-perfect analogy. If you have an ftp daemon on a system and your system ends up freezing in the way you describe, you blame the kernel, not the ftp daemon. glusterfsd is no different from an ftp daemon in terms of how potentially disastrous it can be. glusterfs has other bugs, we admit it, but what you are describing here is really a problem in the kernel. I say this confidently because glusterfsd CANNOT freeze a system, even if it wanted to, intentionally. It is a user-space process. If glusterfs has bugs, then it segfaults, or the process hangs. That is fundamentally very different from a system lock up. As far as your problem is concerned, we can point you to the right place if you can report with kernel/dmesg logs. Please understand that even if we wanted to somehow solve your server lock-up problem by that hypothetical fix in glusterfs, it is just not possible, even theoretically. The fix you need is not in glusterfs. It is not a userspace application you fix for system lock ups. > The system acts as pure server for both glusterfs and nfs. It has no fuse nor > nfs client mount points. However, if you are facing hangs on the glusterfs fuse mountpoint, then it is very likely that it is a glusterfs bug. We are very much interested to hear about those issues. Avati