> Op 23 maart 2016 om 23:18 schreef Sage Weil <sage@xxxxxxxxxxxx>: > > > On Wed, 23 Mar 2016, wido@xxxxxxxx wrote: > > Hi, > > > > This week I got a call to recover a Ceph cluster where somebody ran > > 'ceph osd rm X' for OSDs which were still holding PGs. > > > > He removed multiple OSDs and together they were all the replicas forma > > certain PG. > > > > This raised the question: Should we refuse a rm for a OSD which is still > > up or acting for a PG? > > > > If not, what would the use-case be for removing a OSD from the OSDMap > > when it is still up or acting? > > > > I would say that recovery/backfill has to be finished before we allow an > > OSD to be removed. > > This seems reasonable, as longa there is a --yes-i-really-mean-it flagt > to force it. > > There are several options, though: > > 1- OSD must not be up. Probably doesn't protect from much. > 2- OSD must not be in the up set for any OSD. This will prevent you from > removing just one replica of a PG. I think you mean: "OSD must not be in the up set for any PG" ? > 3- OSD must not be the only up OSD (or, must not bring up set to < > min_size). > > Neither of these really tell you which OSDs the PG is stored on, though. Sure, but it protects you from doing stupid things. Like happened in this case. I had to use ceph-object-store to recover the PG and inject it again. > The mon doesn't actually know that--only the primary does. Either we can > try to cram that info into pg_stat_t, or we can accept that we can't make > a precise condition and instead just settle on something simple. Like 1 & > 2? Doesn't the mon know? The PG map contains allmost all the information. A iteration through the monmap where you check if the OSD is in the "up" set for any of the PGs would be enough, right? In this case the replication level was set to 2 where the user thought it was set to 3. So he thought it was safe to remove 2 machines at once and remove the OSDs as well. Again, user error, but let's protect users from destroying their data as much as possible. Wido > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html