Re: RAM recommendation with large OSDs?

Darrell Enns <darrelle@xxxxxxxxxxxx> · Thu, 3 Oct 2019 16:55:04 +0000

Thanks for the reply Anthony. 

Those are all considerations I am very much aware of. I'm very curious about this though:

> mon_osd_down_out_subtree_limit.  There are cases where it doesn’t kick in and a whole node will attempt to rebalance

In what cases is the limit ignored? Do these exceptions also apply to mon_osd_min_in_ratio? Is this in the docs somewhere?

-----Original Message-----
From: Anthony D'Atri <aad@xxxxxxxxxxxxxx> 
Sent: Wednesday, October 02, 2019 7:46 PM
To: Darrell Enns <darrelle@xxxxxxxxxxxx>
Cc: Paul Emmerich <paul.emmerich@xxxxxxxx>; ceph-users@xxxxxxx
Subject: Re:  Re: RAM recommendation with large OSDs?

This is in part a question of *how many* of those dense OSD nodes you have.  If you have a hundred of them, then most likely they’re spread across a decent number of racks and the loss of one or two is a tolerable *fraction* of the whole cluster.

If you have a cluster of just, say, 3-4 of these dense nodes, component failure, network glitches, and even maintenance become problematic.

You can *mostly* forestall whole-node rebalancing by careful alignment of fault domains with the value of mon_osd_down_out_subtree_limit.  There are cases where it doesn’t kick in and a whole node will attempt to rebalance, which — assuming the CRUSH rules and topology are fault-tolerant — may cause surviving OSDs to reach full or backfillfull states, potentially resulting in an outage.

If the limit does kick in, you’ll have reduced or no redundancy until you either bring the host/OSDs back up, or manually cause the recovery to proceed.

As was already mentioned as well, having a small number of fault domains also limits the EC strategies you can safely use.

> Thanks Paul. I was speaking more about total OSDs and RAM, rather than a single node. However, I am considering building a cluster with a large OSD/node count. This would be for archival use, with reduced performance and availability requirements. What issues would you anticipate with a large OSD/node count? Is the concern just the large rebalance if a node fails and takes out a large portion of the OSDs at once?

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx