Re: Nodown/Noout by OSD_ID?

John Spray <jspray@xxxxxxxxxx> · Wed, 20 Jan 2016 13:55:08 +0000

On Wed, Jan 20, 2016 at 1:32 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> On Wed, 20 Jan 2016, Xiaoxi Chen wrote:
>> Hi,
>>
>>      In many case we need to tag some OSD with NODOWN/NOOUT/NOUP/NOIN
>> tag, but we dont want it cluster wise as these tag may stop other OSDs
>> doing self-healthing.As a an example when an recovered OSD need to
>> catch up with the OSDMap, to prevent flipping we set
>> NODOWN/NOOUT/NOUP, but if other OSD failed by disk error, the failure
>> will be hidden and we are in the risk of lossing the data.
>>
>>      Is that reasonable to have these flag work in OSD granularity?
>> say ceph osd nodown osd.xxx?
>>      Quick look at the code seems NODOWN/NOUP is easier as we could
>> have new status bits in OSDMap
>>      /* status bits */
>> #define CEPH_OSD_EXISTS  (1<<0)
>> #define CEPH_OSD_UP      (1<<1)
>> #define CEPH_OSD_AUTOOUT (1<<2)  /* osd was automatically marked out */
>> #define CEPH_OSD_NEW     (1<<3)  /* osd is new, never marked in */
>>
>> #define CEPH_OSD_NOUP     (1<<4)  /* osd cannot be marked in */
>> #define CEPH_OSD_NODOWN     (1<<5)  /* osd cannot be marked out */
>>
>>      But for NOIN/NOOUT seems a bit struggle as IN/OUT depends on
>> weight? Any suggestion?
>
> This looks reasonable if we can sort out a good interface and suitable
> health warnings.  For example, ceph health and ceph -s should say "N osds
> have noin set", and 'ceph health detail' should tell you which ones.
>
> Maybe something like
>
>  ceph osd set-osd osd.123 noin
>
> ?  I don't particularly like that but we can't do 'ceph osd set ...' since
> that does global osdmap flags.

I think we should make this operate on arbitrary named CRUSH nodes
rather than just OSDs, so that someone can mark a whole host/rack.

John
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html