single problematic node (brick)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I have a rather simple Gluster configuration that consists of 85TB distributed across six nodes. There is one particular node that seems to fail on a ~ weekly basis, and I can't figure out why.

I have attached my Gluster configuration and a recent log file from the problematic node. For a user, when the failure occurs, the symptom is that any attempts to access the Gluster volume from the problematic node fails with "transport endpoint not connected" error.

Restarting the Gluster daemons and remounting the volume on the failed node always fixes the problem. But usually by that point some number of jobs in our batch queue have failed b/c of this issue already, and it's becoming a headache.

It could be a fuse issue, since I see many related error messages in the Gluster log, but I can't disentangle the various errors. The relevant line in my /etc/fstab file is

server:global /global glusterfs defaults,direct-io-mode=disable,log-level=WARNING,log-file=/var/log/gluster.log 0 0

Any ideas on the source of the problem? Could it be a hardware (network) glitch? The fact that it only happens on one node that is identically configured (with same hardware) as other nodes points to something like that.

thanks! Doug

Attachment: gluster.log.gz
Description: application/gzip

Attachment: gluster.cfg.gz
Description: application/gzip

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux