Re: ceph osd down doesn't seem to work

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Tue, 3 Oct 2023 12:33:15 -0400

And unless you *need* a given ailing OSD to be up because it's the only copy of data, you may get better recovery/backfill results by stopping the service for that OSD entirely, so that the recovery reads all to to healthier OSDs.

> On Oct 3, 2023, at 12:21, Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx> wrote:
> 
> Hi Simon,
> 
> If the OSD is actually up, using 'ceph osd down` will cause it to flap
> but come back immediately. To prevent this, you would want to 'ceph
> osd set noup'. However, I don't think this is what you actually want:
> 
>> I'm thinking (but perhaps incorrectly?) that it would be good to keep the OSD down+in, to try to read from it as long as possible
> 
> In this case, you actually want it up+out ('ceph osd out XXX'), though
> if it's replicated then marking it out will switch primaries around so
> that it's not actually read from anymore. It doesn't look like you
> have that much recovery backfill left, so hopefully you'll be in a
> clean state soon, though you'll have to deal with those 'inconsistent'
> and 'recovery_unfound' PGs.
> 
> Josh
> 
> On Tue, Oct 3, 2023 at 10:14 AM Simon Oosthoek <s.oosthoek@xxxxxxxxxxxxx> wrote:
>> 
>> Hi
>> 
>> I'm trying to mark one OSD as down, so we can clean it out and replace
>> it. It keeps getting medium read errors, so it's bound to fail sooner
>> rather than later. When I command ceph from the mon to mark the osd
>> down, it doesn't actually do it. When the service on the osd stops, it
>> is also marked out and I'm thinking (but perhaps incorrectly?) that it
>> would be good to keep the OSD down+in, to try to read from it as long as
>> possible. Why doesn't it get marked down and stay that way when I
>> command it?
>> 
>> Context: Our cluster is in a bit of a less optimal state (see below),
>> this is after one of OSD nodes had failed and took a week to get back up
>> (long story). Due to a seriously unbalanced filling of our OSDs we kept
>> having to reweight OSDs to keep below the 85% threshold. Several disks
>> are starting to fail now (they're 4+ years old and failures are expected
>> to occur more frequently).
>> 
>> I'm open to suggestions to help get us back to health_ok more quickly,
>> but I think we'll get there eventually anyway...
>> 
>> Cheers
>> 
>> /Simon
>> 
>> ----
>> 
>> # ceph -s
>>   cluster:
>>     health: HEALTH_ERR
>>             1 clients failing to respond to cache pressure
>>             1/843763422 objects unfound (0.000%)
>>             noout flag(s) set
>>             14 scrub errors
>>             Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent
>>             Degraded data redundancy: 13795525/7095598195 objects
>> degraded (0.194%), 13 pgs degraded, 12 pgs undersized
>>             70 pgs not deep-scrubbed in time
>>             65 pgs not scrubbed in time
>> 
>>   services:
>>     mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 11h)
>>     mgr: cephmon3(active, since 35h), standbys: cephmon1
>>     mds: 1/1 daemons up, 1 standby
>>     osd: 264 osds: 264 up (since 2m), 264 in (since 75m); 227 remapped pgs
>>          flags noout
>>     rgw: 8 daemons active (4 hosts, 1 zones)
>> 
>>   data:
>>     volumes: 1/1 healthy
>>     pools:   15 pools, 3681 pgs
>>     objects: 843.76M objects, 1.2 PiB
>>     usage:   2.0 PiB used, 847 TiB / 2.8 PiB avail
>>     pgs:     13795525/7095598195 objects degraded (0.194%)
>>              54839263/7095598195 objects misplaced (0.773%)
>>              1/843763422 objects unfound (0.000%)
>>              3374 active+clean
>>              195  active+remapped+backfill_wait
>>              65   active+clean+scrubbing+deep
>>              20   active+remapped+backfilling
>>              11   active+clean+snaptrim
>>              10   active+undersized+degraded+remapped+backfill_wait
>>              2    active+undersized+degraded+remapped+backfilling
>>              2    active+clean+scrubbing
>>              1    active+recovery_unfound+degraded
>>              1    active+clean+inconsistent
>> 
>>   progress:
>>     Global Recovery Event (8h)
>>       [==========================..] (remaining: 2h)
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx