Re: Refuse OSD removal if still up or acting for PG

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 23 Mar 2016, wido@xxxxxxxx wrote:
> Hi,
> 
> This week I got a call to recover a Ceph cluster where somebody ran 
> 'ceph osd rm X' for OSDs which were still holding PGs.
> 
> He removed multiple OSDs and together they were all the replicas forma 
> certain PG.
> 
> This raised the question: Should we refuse a rm for a OSD which is still 
> up or acting for a PG?
> 
> If not, what would the use-case be for removing a OSD from the OSDMap 
> when it is still up or acting?
> 
> I would say that recovery/backfill has to be finished before we allow an 
> OSD to be removed.

This seems reasonable, as longa  there is a --yes-i-really-mean-it flagt  
to force it.

There are several options, though:

1- OSD must not be up.  Probably doesn't protect from much.
2- OSD must not be in the up set for any OSD.  This will prevent you from 
removing just one replica of a PG.
3- OSD must not be the only up OSD (or, must not bring up set to < 
min_size).

Neither of these really tell you which OSDs the PG is stored on, though.  
The mon doesn't actually know that--only the primary does.  Either we can 
try to cram that info into pg_stat_t, or we can accept that we can't make 
a precise condition and instead just settle on something simple.  Like 1 & 
2?

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux