Re: How to add 100 new OSDs...

Thomas Byrne - UKRI STFC <tom.byrne@xxxxxxxxxx> · Thu, 25 Jul 2019 08:37:07 +0000

As a counterpoint, adding large amounts of new hardware in gradually (or more specifically in a few steps) has a few benefits IMO.

- Being able to pause the operation and confirm the new hardware (and cluster) is operating as expected. You can identify problems
 with hardware with OSDs at 10% weight that would be much harder to notice during backfilling, and could cause performance issues to the cluster if they ended up with their full complement of PGs.

- Breaking up long backfills. For a full cluster with large OSDs, backfills can take weeks. I find that letting the mon stores compact,
 and getting the cluster back to health OK is good for my sanity and gives a good stopping point to work on other cluster issues. This obviously depends on the cluster fullness and OSD size.

I still aim for the smallest amount of steps/work, but an initial crush weighting of 10-25% of final weight is a good sanity check
 of the new hardware, and gives a good indication of how to approach the rest of the backfill.

Cheers,
Tom

From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx>
On Behalf Of Paul Emmerich

Sent: 24 July 2019 20:06

To: Reed Dier <reed.dier@xxxxxxxxxxx>

Cc: ceph-users@xxxxxxxxxxxxxx

Subject: Re: [ceph-users] How to add 100 new OSDs...

+1 on adding them all at the same time.

All these methods that gradually increase the weight aren't really necessary in newer releases of Ceph.

Paul

-- 

Paul Emmerich

Looking for help with your Ceph cluster? Contact us at 
https://croit.io

croit GmbH

Freseniusstr. 31h

81247 München

www.croit.io

Tel: +49 89 1896585 90

On Wed, Jul 24, 2019 at 8:59 PM Reed Dier <reed.dier@xxxxxxxxxxx> wrote:

Just chiming in to say that this too has been my preferred method for adding [large numbers of] OSDs.

Set the norebalance nobackfill flags.

Create all the OSDs, and verify everything looks good.

Make sure my max_backfills, recovery_max_active are as expected.

Make sure everything has peered.

Unset flags and let it run.

One crush map change, one data movement.

Reed

That works, but with newer releases I've been doing this:

- Make sure cluster is HEALTH_OK

- Set the 'norebalance' flag (and usually nobackfill)

- Add all the OSDs

- Wait for the PGs to peer. I usually wait a few minutes

- Remove the norebalance and nobackfill flag

- Wait for HEALTH_OK

Wido

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com