Re: Replace OSD while cluster is recovering?

Gustavo Garcia Rondina <grondina@xxxxxxxxxxxx> · Fri, 28 Feb 2025 20:12:27 +0000

Hi Frédéric,

Thank you for the suggestion. I started `ceph pg repair {pgid}` inconsistent PGs but so far, no effect to be seen. Is it possible to monitor the progress of the repairs? With `ceph progress` I can't see it, and for some reason `ceph -w` is hanging.

Kind regards,
Gustavo

________________________________
From: Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx>
Sent: Friday, February 28, 2025 11:19 AM
To: Gustavo Garcia Rondina <grondina@xxxxxxxxxxxx>
Cc: ceph-users <ceph-users@xxxxxxx>
Subject: Re:  Replace OSD while cluster is recovering?

Hi Gustavo,

In your situation, I would run a 'ceph pg repair {pgid}' on each one of these inconsistent PGs reported by 'ceph health detail' so they eventually get active+clean ASAP.

And I would leave scrubbing enabled and set osd_scrub_auto_repair to true with a 'ceph config set osd osd_scrub_auto_repair true' so that inconsistent PGs get automatically repaired at scrubbing time.

Regards,
Frédéric.

----- Le 28 Fév 25, à 16:56, Gustavo Garcia Rondina grondina@xxxxxxxxxxxx a écrit :

> Hello list,
>
> We have a Ceph cluster (17.2.6 quincy) with 2 admin nodes and 6 storage nodes,
> each storage node connected to a JBOD enclosure. Each enclosure houses 28 HDD
> disks of 18 TB size, totaling 168 OSDs. The pool that houses the majority of
> the data is erasure-coded (4+2). We have recently had one disk failure, which
> brought one OSD down:
>
> # ceph osd tree | grep down
>  2    hdd    16.49579          osd.2         down         0  1.00000
>
> This OSD is out of the cluster, but we haven't replaced it physically yet. The
> problem that we are facing is that the cluster was not in the best shape when
> this OSD failed. Currently we have the following:
>
> ################################################
>  cluster:
>    id:     <redacted>
>    health: HEALTH_ERR
>            1026 scrub errors
>            Possible data damage: 18 pgs inconsistent
>            2137 pgs not deep-scrubbed in time
>            2137 pgs not scrubbed in time
>
>  services:
>    mon: 5 daemons, quorum xyz-admin1,xyz-admin2,xyz-osd1,xyz-osd2,xyz-osd3 (age
>    17M)
>    mgr: xyz-admin2.sipadf(active, since 17M), standbys: xyz-admin1.nwaovh
>    mds: 2/2 daemons up, 2 standby
>    osd: 168 osds: 167 up (since 44h), 167 in (since 6w); 220 remapped pgs
>
>  data:
>    volumes: 2/2 healthy
>    pools:   9 pools, 2137 pgs
>    objects: 448.54M objects, 1.0 PiB
>    usage:   1.6 PiB used, 1.1 PiB / 2.7 PiB avail
>    pgs:     134404830/2676514497 objects misplaced (5.022%)
>             1902 active+clean
>             191  active+remapped+backfilling
>             26   active+remapped+backfill_wait
>             15   active+clean+inconsistent
>             2    active+remapped+inconsistent+backfilling
>             1    active+remapped+inconsistent+backfill_wait
>
>  io:
>    recovery: 597 MiB/s, 252 objects/s
>
>  progress:
>    Global Recovery Event (6w)
>      [=========================...] (remaining: 5d)
> ################################################
>
> I have noticed the number of active+clean increasing (was ~1750 two days ago),
> and objects misplaced very slowly decreasing. My question is, should I wait
> until recovery is complete, then repair the 18 damaged pg, and only then
> replace the disk? My thinking is that replacing the disk will trigger more
> backfilling which will slow down the recovering even more.
>
> Another question, should I disable scrubbing while the recovery is not
> finalized?
>
> Thank you for any insights you may be able to provide!
> -
> Gustavo
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx