Hi Sage, That answered my uncertainty precisely, thanks! Regarding docs: this summary would fit in here http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#adding-osds That is logically ceph.com -> Ceph Storage Cluster -> Operations -> Adding/Removing OSDs -> that could be added as section for Adding OSDs, called like "Adding multiple OSDs at once". Ugis 2018-01-08 16:28 GMT+02:00 Sage Weil <sage@xxxxxxxxxxxx>: > Hi Ugis, > > On Mon, 8 Jan 2018, Ugis wrote: >> Hi, >> >> Suggestion first: ceph.com site could have some best practice rules >> for adding new OSDs. Googling regarding this topic reveals that people >> have questions like: >> - may I add serveral OSDSs at once? > > Yes > >> - may I completely change crushmap online so that pgs get completely relocated? > > Yes > >> - what config parameters help to reduce backfill load? > > osd_max_backfill (default: 1) controls how many concurrent backfill (or > recovery) operations an OSD will work on concurrently. (It is actually 2x > this value, since we track recoveries for with teh OSD is primary > separately from those for which it is a replica participant; this is to > avoid deadlock in our relatively simplistic approach to reservation.) > > Suggestions for where this type of summary info would fit into the docs > structure would be helpful! > >> Until that still have this theoretical question on CRUSH algorithm. >> We have ceph cluster with 5 osd hosts, CRUSH rule orders ceph to put >> replicas one copy per host. >> >> If we add 2 osds simultaneously in different hosts - how CRUSH >> guarantees that some existing pg that now should be located on those >> new 2 osds does not get unavailable? Should be something with epochs I >> suppose? >> >> Have found thread mentioning that people have tested completely >> remapping pgs http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-July/019577.html >> Still it is not clear what are the theoretical constraints on adding >> bunches of OSDs (except backfill load). For example if pg gets >> relocated several times in a row(in case osds get added not waiting >> degradation to resolve) - how long that chain of previously allocated >> pgs can be? > > The key thing to keep in mind here is that CRUSH only tells us where > things "should" be as of a given point in time. RADOS is responsible for > keeping track of where things are and have been recently, and making a > safe migration to the desired location. Generally speaking the amount of > history it will remember is unbounded--you could feed the cluster a > million CRUSH map changes faster than it can move data and it won't stop > you. In theory, the amount of state that has to be tracked is bounded by > the size of the cluster... in the truly degenerate case it will think that > every PG existed at some point on every other OSD. In practice (as of > luminous) the amount of state needed to is very small due to the recent > PastIntervals work (see this blog for some more background if you're > interested: > http://ceph.com/community/new-luminous-pg-overdose-protection/) > > Hope that helps! > sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html