Re: How to add 100 new OSDs...

Paul Mezzanini <pfmeec@xxxxxxx> · Sun, 28 Jul 2019 13:16:06 +0000

I'll throw my $.02 in from when I was growing our cluster.

My method ended up being to script up the LVM creation so the lvm names reflect OSD/Journal serial numbers for easy location later,  "ceph-volume prepare" the whole node to get it ready for insertion followed by "ceph-volume activate".  I typically see more of an impact on performance with peering instead of with rebalancing.  

If I'm doing a whole node, I make sure the node's weight is set to 0 and slowly walk it up in chunks.  If it's anything less I just let it fly as-is.  

My workloads didn't seem to mind the increased latency during a huge rebalance but another admin has some latency sensitive VMs hosted and by moving it up slowly I could easily wait for things to settle if he saw the numbers get too high.  It's a simple knob twist to make another admin happy when doing storage changes so I do it.

--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfmeec@xxxxxxx

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.
------------------------

________________________________________
From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Anthony D'Atri <aad@xxxxxxxxxxxxxx>
Sent: Sunday, July 28, 2019 4:09 AM
To: ceph-users
Subject: Re:  How to add 100 new OSDs...

Paul Emmerich wrote:

> +1 on adding them all at the same time.
>
> All these methods that gradually increase the weight aren't really
> necessary in newer releases of Ceph.

Because the default backfill/recovery values are lower than they were in, say, Dumpling?

Doubling (or more) the size of a cluster in one swoop still means a lot of peering and a lot of recovery I/O, I’ve seen a cluster’s data rate go to or near 0 for a brief but nonzero length of time.  If something goes wrong with the network (cough cough subtle jumbo frame lossage cough) , if one has fat-fingered something along the way, etc. going in increments means that a ^C lets the cluster stablize before very long.  Then you get to troubleshoot with HEALTH_OK instead of HEALTH_WARN or HEALTH_ERR.

Having experienced a cluster be DoS’d for hours when its size was tripled in one go, I’m once bitten twice shy.  Yes, that was Dumpling, but even with SSDs on Jewel and Luminous I’ve seen sigificant client performance impact from en-masse topology changes.

— aad

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com