Are you running out of memory? How much memory are the gluster daemons using? On Tue, 2014-05-20 at 11:16 -0700, Doug Schouten wrote: > Hello, > > I have a rather simple Gluster configuration that consists of 85TB > distributed across six nodes. There is one particular node that seems to > fail on a ~ weekly basis, and I can't figure out why. > > I have attached my Gluster configuration and a recent log file from the > problematic node. For a user, when the failure occurs, the symptom is > that any attempts to access the Gluster volume from the problematic node > fails with "transport endpoint not connected" error. > > Restarting the Gluster daemons and remounting the volume on the failed > node always fixes the problem. But usually by that point some number of > jobs in our batch queue have failed b/c of this issue already, and it's > becoming a headache. > > It could be a fuse issue, since I see many related error messages in the > Gluster log, but I can't disentangle the various errors. The relevant > line in my /etc/fstab file is > > server:global /global glusterfs > defaults,direct-io-mode=disable,log-level=WARNING,log-file=/var/log/gluster.log > 0 0 > > Any ideas on the source of the problem? Could it be a hardware (network) > glitch? The fact that it only happens on one node that is identically > configured (with same hardware) as other nodes points to something like > that. > > thanks! Doug > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://supercolony.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users