Re: Experience reducing size 3 to 2 on production cluster?

Linh Vu <linh.vu@xxxxxxxxxxxxxxxxx> · Wed, 15 Dec 2021 11:48:38 +1100

I haven't tested this in Nautilus 14.2.22 (or any nautilus) but in Luminous
or older, if you go from a bigger size to a smaller size, there was either
a bug or a "feature-not-bug" that didn't allow the OSDs to automatically
purge the redundant PGs with data copies. I did this on a size=5 to size=3
situation in a 1000+ OSD cluster, and also just recently in a test Luminous
cluster (size=3 to size=2). In order for the purge to actually happen, I
had to restart every OSD (one at a time for safety, or just run
ceph-ansible site.yml with the osd handler health check = true).

On Wed, Dec 15, 2021 at 8:58 AM Marco Pizzolo <marcopizzolo@xxxxxxxxx>
wrote:

> Hi Martin,
>
> Agreed on the min_size of 2.  I have no intention of worrying about uptime
> in event of a host failure.  Once size of 2 is effectuated (and I'm unsure
> how long it will take), it is our intention to evacuate all OSDs in one of
> 4 hosts, in order to migrate the host to the new cluster, where its OSDs
> will then be added in.  Once added and balanced, we will complete the
> copies (<3 days) and then migrate one more host allowing us to bring size
> to 3.  Once balanced, we will collapse the last 2 nodes into the new
> cluster.  I am hoping that inclusive of rebalancing the whole project will
> only take 3 weeks, but time will tell.
>
> Has anyone asked Ceph to reduce hundreds of millions if not billions of
> files from size 3 to size 2, and if so, were you successful?  I know it
> *should* be able to do this, but sometimes theory and practice don't
> perfectly overlap.
>
> Thanks,
> Marco
>
> On Sat, Dec 11, 2021 at 4:37 AM Martin Verges <martin.verges@xxxxxxxx>
> wrote:
>
> > Hello,
> >
> > avoid size 2 whenever you can. As long as you know that you might lose
> > data, it can be an acceptable risk while migrating the cluster. We had
> that
> > in the past multiple time and it is a valid use case in our opinion.
> > However make sure to monitor the state and recover as fast as possible.
> > Leave min_size on 2 as well and accept the potential downtime!
> >
> > --
> > Martin Verges
> > Managing director
> >
> > Mobile: +49 174 9335695  | Chat: https://t.me/MartinVerges
> >
> > croit GmbH, Freseniusstr. 31h, 81247 Munich
> > CEO: Martin Verges - VAT-ID: DE310638492
> > Com. register: Amtsgericht Munich HRB 231263
> > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
> >
> >
> > On Fri, 10 Dec 2021 at 18:05, Marco Pizzolo <marcopizzolo@xxxxxxxxx>
> > wrote:
> >
> >> Hello,
> >>
> >> As part of a migration process where we will be swinging Ceph hosts from
> >> one cluster to another we need to reduce the size from 3 to 2 in order
> to
> >> shrink the footprint sufficiently to allow safe removal of an OSD/Mon
> >> node.
> >>
> >> The cluster has about 500M objects as per dashboard, and is about 1.5PB
> in
> >> size comprised solely of small files served through CephFS to Samba.
> >>
> >> Has anyone encountered a similar situation?  What (if any) problems did
> >> you
> >> face?
> >>
> >> Ceph 14.2.22 bare metal deployment on Centos.
> >>
> >> Thanks in advance.
> >>
> >> Marco
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx