On Mon, Feb 12, 2018 at 8:51 AM, Simon Ironside <sironside@xxxxxxxxxxxxx> wrote: > On 09/02/18 09:05, Janne Johansson wrote: >> >> 2018-02-08 23:38 GMT+01:00 Simon Ironside <sironside@xxxxxxxxxxxxx >> <mailto:sironside@xxxxxxxxxxxxx>>: >> >> Hi Everyone, >> I recently added an OSD to an active+clean Jewel (10.2.3) cluster >> and was surprised to see a peak of 23% objects degraded. Surely this >> should be at or near zero and the objects should show as misplaced? >> I've searched and found Chad William Seys' thread from 2015 but >> didn't see any conclusion that explains this: >> >> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-July/003355.html >> >> <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-July/003355.html> >> >> I agree, I always viewed it as if you had three copies of your PG, add a >> new OSD and that PG decides one of the copies should be on that OSD instead >> of one of the 3 older ones, it would just stop caring about the old PG, >> create a new empty PG on the new OSD and then as the synch is going towards >> the new PG it is "behind" in the data it contains until sync is done, but it >> (and its 2 previous copies) are correctly placed for the new crush map. >> Misplaced would probably be a more natural way of seeing it, at least if the >> now-abandoned PG was still being updated while the sync is done, but I don't >> think it is. It gets orphaned rather quickly as the new OSD kicks in. >> >> I guess this design choice boils down to "being able to handle someone >> adding more OSDs to a cluster that is close to getting full", at the expense >> of "discarding one or more of the old copies and scaring the admin as if >> there was a huge issue when just adding one or many new shiny OSDs". > > > It certainly does scare me, especially as this particular cluster is size=2, > min_size=1. > > My worry is that I could experience a disk failure while adding a new OSD > and potentially lose data You've already indicated you are willing to accept data loss by configuring size=2, min_size=1. Search for "2x replication: A BIG warning" > while if the same disk failed when the cluster was > active+clean I wouldn't. That doesn't seem like a very safe design choice > but perhaps the real answer is to use size=3. > > Reweighting an active OSD to 0 does the same thing on my cluster, causes the > objects to go degraded instead of misplaced as I'd expect. > > > Thanks, > Simon. > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com