>>I can easily reproduce the problem on this cluster. It appears that >>there is a "primary" replica and a "secondary" replica. >> >>If I reboot or kill the glusterfs process there is no problems on the >>running VM. > > Good. That is as expected. Sorry, I was not clear enough. I meant that if I reboot the "secondary" replica, there are no problems. >>If I reboot or "killall -KILL glusterfsd" the primary replica (so I >>don't let it terminate properly), I can block the the VM each time. > > Have you followed my blog advise to prevent the vm from remounting the image filesystem read-only and waited ping-timeout seconds (42 by default)? I have not followed your advice, but there is a difference: I get i/o errors *reading* from the disk. Once the problem kicks, I cannot issue commands (like ls) because they can't be read. There is a problem with that setup: It cannot be implemented on windows machines (which are move vulnerable) and also cannot be implemented on machines which I have no control on (customers). >>If I "reset" the VM it will not find the boot disk. > > Somewhat expected if within the ping-timeout. The issue persists beyond the ping-timeout. The KVM process needs to be reinitialized. I guess libgfapi needs to reconnect from scratch. >>If I power down and power up the VM, then it will boot but will find >>corruption on disk during the boot that requires fixing. > > Expected since the vm doesn't use the image filesystem synchronously. You can change that with mount options at the cost of performance. Ok. I understand this point. > Unless you wait for ping-timeout and then continue writing the replica is actually still in sync. It's only out of sync if you write to one replica but not the other. > > You can shorten the ping timeout. There is a cost to reconnection if you do. Be sure to test a scenario with servers under production loads and see what the performance degradation during a reconnect is. Balance your needs appropriately. Could you please elaborate on the cost of reconnection? I will try to run with a very short ping timeout (2sec) and see if the problem is in the ping-timeout or perhaps not. Paul 2014-04-06 17:52 GMT+02:00 Paul Penev <ppquant@xxxxxxxxx>: > Hello, > > I'm having an issue with rebooting bricks holding images for live KVM > machines (using libgfapi). > > I have a replicated+distributed setup of 4 bricks (2x2). The cluster > contains images for a couple of kvm virtual machines. > > My problem is that when I reboot a brick containing a an image of a > VM, the VM will start throwing disk errors and eventually die. > > The gluster volume is made like this: > > # gluster vol info pool > > Volume Name: pool > Type: Distributed-Replicate > Volume ID: xxxxxxxxxxxxxxxxxxxx > Status: Started > Number of Bricks: 2 x 2 = 4 > Transport-type: tcp > Bricks: > Brick1: srv10g:/data/gluster/brick > Brick2: srv11g:/data/gluster/brick > Brick3: srv12g:/data/gluster/brick > Brick4: srv13g:/data/gluster/brick > Options Reconfigured: > network.ping-timeout: 10 > cluster.server-quorum-type: server > diagnostics.client-log-level: WARNING > auth.allow: 192.168.0.*,127.* > nfs.disable: on > > The KVM instances run on the same gluster bricks, with disks mounted > as : file=gluster://localhost/pool/images/vm-xxx-disk-1.raw,.......,cache=writethrough,aio=native > > My self-heal backlog is not always 0. It looks like some writes are > not going to all bricks at the same time (?). > > gluster vol heal pool info > > sometime shows the images needing sync on one brick, the other or both. > > There are no network problems or errors on the wire. > > Any ideas what could be causing this ? > > Thanks. _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users