On 09/10/2009 09:38 AM, David Saez Padros wrote: > >> In particular, if you read about the intent of FUSE - the technology >> being used to create a file system, I think you will find that what >> Anand is saying is the *exact* purpose for this project. > > the lockups are on server side not in client side and fuse is > not used on the server side I think there is Stephan's problem and your problem, and I'm losing track over which one is being discussed. Sorry. :-) Server side, pure user space, with hardware locking up, or the kernel not be able to use a hardware resource - is a kernel problem. Yes, user space can trigger it - for example, by opening so many sockets and other such kernel resources, as to fill low memory - but as we found out recently, this is where the kernel is supposed to come in and kick the user program out with an out of memory killer, or not grant the resources in the first place. As it is - do we have evidence that GlusterFS is using up large number of file descriptors, sockets, processes, virtual memory, or other kernel resource? It seems to me that the failure in the case with the logs was the kernel finding the CPU not waking up for a long period of time? I'm not saying ignore GlusterFS in your evaluation - but I am saying if you truly want a resolution, you really should consider trying the linux developers, and seeing what they think. If they say this is a GlusterFS specific problem, I'm sure Anand and gluster.com would take a very serious second look at it. Until then - they gave it a shot, and don't have the ability to diagnose your problem or fix your problem. You could say they are incompetent and uncaring about their users - but a more accurate statement would probably be that this is entirely out of their domain, and they are unable to help you, and their professional recommendation and mine is to contact RedHat if you have a subscription, or if you do not, try the linux developers. I have no doubt at all that user space programs can hurt the kernel - but in every situation I can think of, the problem is really a *kernel* problem. The user space is just discovering the problem - which is unfortunate - but honestly, shit happens. We recently dealt with load builds failing due to the out of memory issue I reference above, as 32-bit linux kernel doesn't work very well with 32 Gbytes of RAM. Another problem we dealt with was Subversion mod_dav_fs quickly consuming all virtual memory in the machine, eventually leading to machine failure. For the Subversion issue - mod_dav_fs or something is uses should not be continually consuming more memory - so they have a bug - but the kernel *also* has a bug, because it should not allow httpd to bring the machine to a halt due to exhausted virtual memory. In the Subversion case, it's low on our priority list to solve, since we can work around it by having Apache recycle the process space more frequently and avoid the symptoms - but we should be taking this to both the Subversion developers at Collab.net *and* the Linux kernel developers. (I know what the Linux kernel developers will say though - 32-bit kernel was not designed for 32 Gbytes of RAM, and upgrade to a 64-bit kernel - but we have RHEL subscription, so perhaps we could take it that route...) Cheers, mark -- Mark Mielke<mark at mielke.cc>