Hi Simon, If the OSD is actually up, using 'ceph osd down` will cause it to flap but come back immediately. To prevent this, you would want to 'ceph osd set noup'. However, I don't think this is what you actually want: > I'm thinking (but perhaps incorrectly?) that it would be good to keep the OSD down+in, to try to read from it as long as possible In this case, you actually want it up+out ('ceph osd out XXX'), though if it's replicated then marking it out will switch primaries around so that it's not actually read from anymore. It doesn't look like you have that much recovery backfill left, so hopefully you'll be in a clean state soon, though you'll have to deal with those 'inconsistent' and 'recovery_unfound' PGs. Josh On Tue, Oct 3, 2023 at 10:14 AM Simon Oosthoek <s.oosthoek@xxxxxxxxxxxxx> wrote: > > Hi > > I'm trying to mark one OSD as down, so we can clean it out and replace > it. It keeps getting medium read errors, so it's bound to fail sooner > rather than later. When I command ceph from the mon to mark the osd > down, it doesn't actually do it. When the service on the osd stops, it > is also marked out and I'm thinking (but perhaps incorrectly?) that it > would be good to keep the OSD down+in, to try to read from it as long as > possible. Why doesn't it get marked down and stay that way when I > command it? > > Context: Our cluster is in a bit of a less optimal state (see below), > this is after one of OSD nodes had failed and took a week to get back up > (long story). Due to a seriously unbalanced filling of our OSDs we kept > having to reweight OSDs to keep below the 85% threshold. Several disks > are starting to fail now (they're 4+ years old and failures are expected > to occur more frequently). > > I'm open to suggestions to help get us back to health_ok more quickly, > but I think we'll get there eventually anyway... > > Cheers > > /Simon > > ---- > > # ceph -s > cluster: > health: HEALTH_ERR > 1 clients failing to respond to cache pressure > 1/843763422 objects unfound (0.000%) > noout flag(s) set > 14 scrub errors > Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent > Degraded data redundancy: 13795525/7095598195 objects > degraded (0.194%), 13 pgs degraded, 12 pgs undersized > 70 pgs not deep-scrubbed in time > 65 pgs not scrubbed in time > > services: > mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 11h) > mgr: cephmon3(active, since 35h), standbys: cephmon1 > mds: 1/1 daemons up, 1 standby > osd: 264 osds: 264 up (since 2m), 264 in (since 75m); 227 remapped pgs > flags noout > rgw: 8 daemons active (4 hosts, 1 zones) > > data: > volumes: 1/1 healthy > pools: 15 pools, 3681 pgs > objects: 843.76M objects, 1.2 PiB > usage: 2.0 PiB used, 847 TiB / 2.8 PiB avail > pgs: 13795525/7095598195 objects degraded (0.194%) > 54839263/7095598195 objects misplaced (0.773%) > 1/843763422 objects unfound (0.000%) > 3374 active+clean > 195 active+remapped+backfill_wait > 65 active+clean+scrubbing+deep > 20 active+remapped+backfilling > 11 active+clean+snaptrim > 10 active+undersized+degraded+remapped+backfill_wait > 2 active+undersized+degraded+remapped+backfilling > 2 active+clean+scrubbing > 1 active+recovery_unfound+degraded > 1 active+clean+inconsistent > > progress: > Global Recovery Event (8h) > [==========================..] (remaining: 2h) > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx