Re: How to add 100 new OSDs...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



As a counterpoint, adding large amounts of new hardware in gradually (or more specifically in a few steps) has a few benefits IMO.

 

- Being able to pause the operation and confirm the new hardware (and cluster) is operating as expected. You can identify problems with hardware with OSDs at 10% weight that would be much harder to notice during backfilling, and could cause performance issues to the cluster if they ended up with their full complement of PGs.

 

- Breaking up long backfills. For a full cluster with large OSDs, backfills can take weeks. I find that letting the mon stores compact, and getting the cluster back to health OK is good for my sanity and gives a good stopping point to work on other cluster issues. This obviously depends on the cluster fullness and OSD size.

 

I still aim for the smallest amount of steps/work, but an initial crush weighting of 10-25% of final weight is a good sanity check of the new hardware, and gives a good indication of how to approach the rest of the backfill.

 

Cheers,

Tom

 

From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> On Behalf Of Paul Emmerich
Sent: 24 July 2019 20:06
To: Reed Dier <reed.dier@xxxxxxxxxxx>
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re: [ceph-users] How to add 100 new OSDs...

 

+1 on adding them all at the same time.

 

All these methods that gradually increase the weight aren't really necessary in newer releases of Ceph.

 

Paul

 

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

 

 

On Wed, Jul 24, 2019 at 8:59 PM Reed Dier <reed.dier@xxxxxxxxxxx> wrote:

Just chiming in to say that this too has been my preferred method for adding [large numbers of] OSDs.

 

Set the norebalance nobackfill flags.

Create all the OSDs, and verify everything looks good.

Make sure my max_backfills, recovery_max_active are as expected.

Make sure everything has peered.

Unset flags and let it run.

 

One crush map change, one data movement.

 

Reed




That works, but with newer releases I've been doing this:

- Make sure cluster is HEALTH_OK
- Set the 'norebalance' flag (and usually nobackfill)
- Add all the OSDs
- Wait for the PGs to peer. I usually wait a few minutes
- Remove the norebalance and nobackfill flag
- Wait for HEALTH_OK

Wido

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux