Re: proper way to temporarily remove brick server from replica cluster to avoid kvm guest disruption

Strahil Nikolov <hunter86_bg@xxxxxxxxx> · Mon, 7 Mar 2022 05:48:41 +0000 (UTC)

Odd number of nodes -> prevent split brain situations. For example imagine that vh1-4 see each other but don't see vh5-8. Same happens to vh5-8 -> they don't see vh1-4.How should the cluster react ? It has to pick a partition that should work and usually for a partitiont to work we need 50%+ 1 node.
In your case you don't need a Hypervisor to be part of the Gluster TSP, so vh5-8 are not needed.

You need all nodes to be up in order to make changes in the volume (like shrinking/adding bricks, setting options), so it is a best practice to bring vh5 up or remove it forcefully.

If you have redhat developer account (which is free), you can take a look at https://access.redhat.com/articles/1433123

Also, it's worth mentioning that you are not using the 'virt' volume group of settings which is optimized for VMs and live migration of VMs between the nodes.Take a look at it:  /var/lib/glusterd/groups/virt and consider setting it. Some oVirt users report that using libgfapi brings better performance, so consider that too .

WATNING: The virt group enables sharding and once this option is enabled , you should never disable it !

 Best Regards,
Strahil Nikolov
   On Sun, Mar 6, 2022 at 23:31, Todd Pfaff
<pfaff@xxxxxxxxxxxxxxxxx> wrote:

  On Sun, 6 Mar 2022, Strahil Nikolov wrote:

> It seems that only vh1-4 provide bricks, so vh5,6,7,8 can be removed.

Right, that was the point of my question: how to properly shutdown any one 
of vh1-4 for maintenance without disrupting any VMs that may be running on 
any of vh1-8.

When I had done a test of taking vh1 offline several days ago, all of the 
VMs on vh4 went root-fs-read-only, which suprised me.  I suppose it's 
possible that there was something else at play that I haven't realized, 
and that taking the vh1 gluster peer offline was not the root cause of the 
vh4 VM failure.  I haven't yet tried another such test yet - I was holding 
off until I'd gotten some advice here first.

> First check why vh5 is offline. Changes to all modes are propagated and in
> this case vh5 is down and won't receive the peer detach commands.

Ok, interesting, but I have to admit that I don't understand that 
requirement.  I knew that vh5 was offline but I didn't know that I'd have 
to bring it back online in order to properly shutdown one of vh1-4.  Are 
you certain about that?  That is, if vh5 stays offline and I take vh4 
offline, and then I bring vh5 online, will the quorum of peers not set vh5 
straight?

> 
> Once you fix vh5, you can safely 'gluster peer detach' any of the nodes that
> is not in the volume.

Ok, I'll try peer detach to take any of vh1-4 offline in a controlled 
manner.

I take this to mean that if any one of the vh1-4 replica members were to 
go offline in a uncontrolled manner, gluster peers may have a problem 
which could lead to the sort of VM behaviour that I experienced.  Frankly 
this suprises me - I expected that my setup was more resilient in the face 
of losing gluster replica members as long as there was still a quorum of 
members operating normally.

> 
> Keep in mind that it's always best practice to have odd number of nodes in
> the TSP (3,5,7,9,etc).

Do you know why that's the case?  I understand that 3 or more are 
recommended (could be 2 and a arbiter) but why an odd number?  What 
benefit does 3 provide that 4 does not?

Thanks,
Todd

> 
> Best Regards,
> Strahil Nikolov
>
>       On Sun, Mar 6, 2022 at 4:06, Todd Pfaff
> <pfaff@xxxxxxxxxxxxxxxxx> wrote:
> [root@vh1 ~]# gluster volume info vol1
> 
> Volume Name: vol1
> Type: Replicate
> Volume ID: dfd681bb-5b68-4831-9863-e13f9f027620
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 4 = 4
> Transport-type: tcp
> Bricks:
> Brick1: vh1:/pool/gluster/brick1/data
> Brick2: vh2:/pool/gluster/brick1/data
> Brick3: vh3:/pool/gluster/brick1/data
> Brick4: vh4:/pool/gluster/brick1/data
> Options Reconfigured:
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: off
> 
> 
> [root@vh1 ~]# gluster pool list
> UUID                                    Hostname        State
> 75fc4258-fabd-47c9-8198-bbe6e6a906fb    vh2            Connected
> 00697e28-96c0-4534-a314-e878070b653d    vh3            Connected
> 2a9b891b-35d0-496c-bb06-f5dab4feb6bf    vh4            Connected
> 8ba6fb80-3b13-4379-94cf-22662cbb48a2    vh5            Disconnected
> 1298d334-3500-4b40-a8bd-cc781f7349d0    vh6            Connected
> 79a533ac-3d89-44b9-b0ce-823cfec8cf75    vh7            Connected
> 4141cd74-9c13-404c-a02c-f553fa19bc22    vh8            Connected
> 
> 
> On Sat, 5 Mar 2022, Strahil Nikolov wrote:
> 
> > Hey Todd,
> >
> > can you provide 'gluster volume info <VOLUME>' ?
> >
> > Best Regards,
> > Strahil Nikolov
> >
> >      On Sat, Mar 5, 2022 at 18:17, Todd Pfaff
> > <pfaff@xxxxxxxxxxxxxxxxx> wrote:
> > I have a replica volume created as:
> >
> > gluster volume create vol1 replica 4 \
> >   host{1,2,3,4}:/mnt/gluster/brick1/data \
> >   force
> >
> >
> > All hosts host{1,2,3,4} mount this volume as:
> >
> > localhost:/vol1 /mnt/gluster/vol1 glusterfs defaults
> >
> >
> > Some other hosts are trusted peers but do not contribute bricks, and
> > they
> > also mount vol1 in the same way:
> >
> > localhost:/vol1 /mnt/gluster/vol1 glusterfs defaults
> >
> >
> > All hosts run CentOS 7.9, and all are running glusterfs 9.4 or 9.5
> from
> > centos-release-gluster9-1.0-1.el7.noarch.
> >
> >
> > All hosts run kvm guests that use qcow2 files for root filesystems
> that
> > are stored on gluster volume vol1.
> >
> >
> > This is all working well, as long as none of host{1,2,3,4} go
> offline.
> >
> >
> > I want to take one of host{1,2,3,4} offline temporarily for
> > maintenance.
> > I'll refer to this as hostX.
> >
> > I understand that hostX will need to be healed when it comes back
> > online.
> >
> > I would, of course, migrate guests from hostX to another host, in
> which
> > case hostX would then only be participating as a gluster replica
> brick
> > provider and serving gluster client requests.
> >
> > What I've experienced is that if I take one of host{1,2,3,4} offline,
> > this
> > can disrupt some of the VM guests on various other hosts such that
> > their
> > root filesystems go to read-only.
> >
> > What I'm looking for here are suggestions as to how to properly take
> > one
> > of host{1,2,3,4} offline to avoid such disruption or how to tune the
> > libvirt kvm hosts and guests to be sufficiently resilient in the face
> > of
> > taking one gluster replica node offline.
> >
> > Thanks,
> > Todd
> > ________
> >
> >
> >
> > Community Meeting Calendar:
> >
> > Schedule -
> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > Bridge: https://meet.google.com/cpu-eiue-hvk
> > Gluster-users mailing list
> > Gluster-users@xxxxxxxxxxx
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> >
> >
> >
> 
> 
>

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users