Re: Experience reducing size 3 to 2 on production cluster?

Marco Pizzolo <marcopizzolo@xxxxxxxxx> · Wed, 15 Dec 2021 15:51:31 -0500

Thanks Linh Vu, so it sounds like i should be prepared to bounce the OSDs
and/or Hosts, but I haven't heard anyone yet say that it won't work, so I
guess there's that...

On Tue, Dec 14, 2021 at 7:48 PM Linh Vu <linh.vu@xxxxxxxxxxxxxxxxx> wrote:

> I haven't tested this in Nautilus 14.2.22 (or any nautilus) but in
> Luminous or older, if you go from a bigger size to a smaller size, there
> was either a bug or a "feature-not-bug" that didn't allow the OSDs to
> automatically purge the redundant PGs with data copies. I did this on a
> size=5 to size=3 situation in a 1000+ OSD cluster, and also just recently
> in a test Luminous cluster (size=3 to size=2). In order for the purge to
> actually happen, I had to restart every OSD (one at a time for safety, or
> just run ceph-ansible site.yml with the osd handler health check = true).
>
> On Wed, Dec 15, 2021 at 8:58 AM Marco Pizzolo <marcopizzolo@xxxxxxxxx>
> wrote:
>
>> Hi Martin,
>>
>> Agreed on the min_size of 2.  I have no intention of worrying about uptime
>> in event of a host failure.  Once size of 2 is effectuated (and I'm unsure
>> how long it will take), it is our intention to evacuate all OSDs in one of
>> 4 hosts, in order to migrate the host to the new cluster, where its OSDs
>> will then be added in.  Once added and balanced, we will complete the
>> copies (<3 days) and then migrate one more host allowing us to bring size
>> to 3.  Once balanced, we will collapse the last 2 nodes into the new
>> cluster.  I am hoping that inclusive of rebalancing the whole project will
>> only take 3 weeks, but time will tell.
>>
>> Has anyone asked Ceph to reduce hundreds of millions if not billions of
>> files from size 3 to size 2, and if so, were you successful?  I know it
>> *should* be able to do this, but sometimes theory and practice don't
>> perfectly overlap.
>>
>> Thanks,
>> Marco
>>
>> On Sat, Dec 11, 2021 at 4:37 AM Martin Verges <martin.verges@xxxxxxxx>
>> wrote:
>>
>> > Hello,
>> >
>> > avoid size 2 whenever you can. As long as you know that you might lose
>> > data, it can be an acceptable risk while migrating the cluster. We had
>> that
>> > in the past multiple time and it is a valid use case in our opinion.
>> > However make sure to monitor the state and recover as fast as possible.
>> > Leave min_size on 2 as well and accept the potential downtime!
>> >
>> > --
>> > Martin Verges
>> > Managing director
>> >
>> > Mobile: +49 174 9335695  | Chat: https://t.me/MartinVerges
>> >
>> > croit GmbH, Freseniusstr. 31h, 81247 Munich
>> > CEO: Martin Verges - VAT-ID: DE310638492
>> > Com. register: Amtsgericht Munich HRB 231263
>> > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
>> >
>> >
>> > On Fri, 10 Dec 2021 at 18:05, Marco Pizzolo <marcopizzolo@xxxxxxxxx>
>> > wrote:
>> >
>> >> Hello,
>> >>
>> >> As part of a migration process where we will be swinging Ceph hosts
>> from
>> >> one cluster to another we need to reduce the size from 3 to 2 in order
>> to
>> >> shrink the footprint sufficiently to allow safe removal of an OSD/Mon
>> >> node.
>> >>
>> >> The cluster has about 500M objects as per dashboard, and is about
>> 1.5PB in
>> >> size comprised solely of small files served through CephFS to Samba.
>> >>
>> >> Has anyone encountered a similar situation?  What (if any) problems did
>> >> you
>> >> face?
>> >>
>> >> Ceph 14.2.22 bare metal deployment on Centos.
>> >>
>> >> Thanks in advance.
>> >>
>> >> Marco
>> >> _______________________________________________
>> >> ceph-users mailing list -- ceph-users@xxxxxxx
>> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >>
>> >
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx