Re: Virt-store use case - HA failure issue - suggestions needed

Vince Loschiavo <vloschiavo@xxxxxxxxx> · Thu, 31 Jul 2014 13:37:01 -0700

On Thu, Jul 31, 2014 at 12:02 PM, Humble Devassy Chirammal <humble.devassy@xxxxxxxxx> wrote:

I  second Jason, either the quorum=auto has to be disabled or just add one more server to the trusted pool and find the result .

--Humble

On Fri, Aug 1, 2014 at 12:22 AM, Jason Brooks <jbrooks@xxxxxxxxxx> wrote:

----- Original Message -----

> From: "Vince Loschiavo" <vloschiavo@xxxxxxxxx>

> To: gluster-users@xxxxxxxxxxx

> Sent: Thursday, July 31, 2014 9:22:16 AM

> Subject:  Virt-store use case - HA failure issue -     suggestions needed

>

> I'm currently testing Gluster 3.5.1 in a two server QEMU/KVM environment.

> Centos 6.5:

> Two servers (KVM07 & KVM08), Two brick (one brick per server) replicated

> volume

>

> I've tuned the volume per the documentation here:

> http://gluster.org/documentation/use_cases/Virt-store-usecase/

>

> I have the gluster volume fuse mounted on KVM07 and KVM08 and am using it

> to store raw disk images.

>

> KVM is using the fuse mounted volume as a "dir: Filesystem Directory:

> storage pool.

>

> With setting dynamic_ownership = 0 in /etc/libvirt/qemu.conf and chown-ing

> the files to qemu:qemu, live migration works great.

>

> Problem:

> If I need to take down one of these servers for maintenance, I live migrate

> the VMs to the other server.

> service gluster stop

> then kill all the remaining gluster and brick processes.

The guide says that quorum-type=auto sets a rule such that at least half

of the bricks in the replica group should be UP and running. If not,

the replica group becomes read-only. I think the rule is actually 51%,

so bringing down one of the two servers makes your volume read-only.

If you want two servers, you need to unset this rule. Better to add a

third server and a third replica, though.

Regards, Jason

>

> At this point, the VMs die.  The Fuse mount recovers and remains attached

> to the volume via the other server, but the VIRT disk images are not fully

> synced.

>

> This causes the VMs to go into a read-only files system state, then kernel

> panic.  Reboots/restarts of the VMs just cause kernel panics.  This

> effectively brings down the two node cluster.

>

> Bringing back up the gluster node / bricks /etc, prompts a self-heal.  Once

> self-heal is completed, the VMs can boot normally.

>

> Question: is there a better way to accomplish HA with live/running Virt

> images?  The goal is to be able to bring down any one server in the pair

> and perform maintenance without interrupting the VMs.

>

> I assume my shutdown process is flawed but haven't been able to find a

> better process.

>

> Any suggestions are welcome.

>

>

> --

> -Vince Loschiavo

>

> _______________________________________________

> Gluster-users mailing list

> Gluster-users@xxxxxxxxxxx

> http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://supercolony.gluster.org/mailman/listinfo/gluster-users

That was it.  Thank you.  I'm somewhat space constrained in my lab, so I chose to disable quorum and set server-quorum to 50%.  I assume that was redundant, but it works for me.

-- 
-Vince Loschiavo

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users