On 11/26/12 4:46 AM, ZHANG Cheng wrote: > Early this morning our 2 bricks replicated cluster had an outage. The > disk space for one of the brick server (brick02) was used up. When we > responded to the disk full alert, the issue already lasted for a few > hours. We reclaimed some disk space, and reboot the brick02 server, > expecting once it come back it will go self healing. > > It did go self healing, but just after couple minutes, access to > gluster filesystem freeze. Tons of "nfs: server brick not responding, > still trying" popped up in dmesg. The load average on app server went > up to 200 something from usual 0.10. We had to shutdown brick02 server > or stop gluster server process on it, to get the gluster cluster back > working. Have you checked the glustershd logs (should be in /var/log/glusterfs) on the bricks? If there's nothing useful there, a statedump would also be useful. See the "gluster volume statedump" instructions on your friendly local admin guide (section 10.4 for GlusterFS 3.3). Most helpful of all would be a bug report with any of this information plus a description of your configuration. You can either create a new one or attach the info to an existing bug if one seems to fit. The following seems like it might be related, even though it's for virtual machines. https://bugzilla.redhat.com/show_bug.cgi?id=881685