I'm using gluster for a virt-store with 3x2 distributed/replicated servers for 16 qemu/kvm/libvirt virtual machines using image files stored in gluster and accessed via libgfapi. Eight of these disk images are standalone, while the other eight are qcow2 images which all share a single backing file. For the most part, this is all working very well. However, one of the gluster servers (azathoth) causes three of the standalone VMs and all 8 of the shared-backing-image VMs to fail if it goes down. Any of the other gluster servers can go down with no problems; only azathoth causes issues. In addition, the kvm hosts have the gluster volume fuse mounted and one of them (out of five) detects an error on the gluster volume and puts the fuse mount into read-only mode if azathoth goes down. libgfapi connections to the VM images continue to work normally from this host despite this and the other four kvm hosts are unaffected. It initially seemed relevant that I have the libgfapi URIs specified as gluster://azathoth/..., but I've tried changing them to make the initial connection via other gluster hosts and it had no effect on the problem. Losing azathoth still took them out. In addition to changing the mount URI, I've also manually run a heal and rebalance on the volume, enabled the bitrot daemons (then turned them back off a week later, since they reported no activity in that time), and copied one of the standalone images to a new file in case it was a problem with the file itself. As far as I can tell, none of these attempts changed anything. So I'm at a loss. Is this a known type of problem? If so, how do I fix it? If not, what's the next step to troubleshoot it? # gluster --version glusterfs 3.8.8 built on Jan 11 2017 14:07:11 Repository revision: git://git.gluster.com/glusterfs.git # gluster volume status Status of volume: palantir Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick saruman:/var/local/brick0/data 49154 0 Y 10690 Brick gandalf:/var/local/brick0/data 49155 0 Y 18732 Brick azathoth:/var/local/brick0/data 49155 0 Y 9507 Brick yog-sothoth:/var/local/brick0/data 49153 0 Y 39559 Brick cthulhu:/var/local/brick0/data 49152 0 Y 2682 Brick mordiggian:/var/local/brick0/data 49152 0 Y 39479 Self-heal Daemon on localhost N/A N/A Y 9614 Self-heal Daemon on saruman.lub.lu.se N/A N/A Y 15016 Self-heal Daemon on cthulhu.lub.lu.se N/A N/A Y 9756 Self-heal Daemon on gandalf.lub.lu.se N/A N/A Y 5962 Self-heal Daemon on mordiggian.lub.lu.se N/A N/A Y 8295 Self-heal Daemon on yog-sothoth.lub.lu.se N/A N/A Y 7588 Task Status of Volume palantir ------------------------------------------------------------------------------ Task : Rebalance ID : c38e11fe-fe1b-464d-b9f5-1398441cc229 Status : completed -- Dave Sherohman _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users