> Altough is clear that the bug itself is a kernel bug it's also > clear that glusterfs is triggering that bug. The same system under > the same load but using nfs instead of gluster does not have this > problem. This problem also does not happen copying lots of data > using scp. Also, i have never seen such this hangs in more than > 10 years using unix boxes. But the more strange thing is that this > is a bug that can make glusterfs totally unusable and the developers > seem to don't worry even in finding what is exactly causing that > problem. I would like to politely disagree with your final statement. In a previous thread we have indeed promised that we will be fixing the timeout techniques to take into consideration the situation where the backend fs is hanging so that the entire glusterfs volume does not become unusable. As far as debugging the system hang is concerned, you need to be looking for kernel logs and dmesg output. You really are wasting your time trying to debug a kernel fs hang by looking for logs from a user application. The kernel oops backtrace shows you exactly where the kernel is locking up. Take the backtrace to the kernel developers and they will tell you the next step. It is for this very reason the kernel supports serial console logging to extract hints when the system cannot log to files. It is not that we do not want to help, but there is only so much we can do as a user application. We issue system calls and process the result. The effort needed to programmatically figure out which the hanging system call is (with wierd and awkwardly implemented ad-hoc timeouts in the code) and the amount of hint you get from that is far less worth than directly going to the heart of the problem - get the kernel backtrace from a serial console and you will be just one step from your solution. If you can also post back a link to the thread on the appropriate ML where you post your kernel backtrace, we would be interested to keep a watch on it, or provide more (specific) info if found necessary by those developers. Almost always the kernel backtrace would be sufficient. That is the correct first step for debugging this problem. Avati