On 28/06/2023 21:26, Niklas Hambüchen wrote:
I have increased the number of scrubs per OSD from 1 to 3 using `ceph config set osd osd_max_scrubs 3`. Now the problematic PG is scrubbing in `ceph pg ls`: active+clean+scrubbing+deep+inconsistent
This succeeded! The deep-scrub fixed the PG and the cluster is healthy again. Thanks a lot! So indeed the issue was that the deep-scrub I had asked for was simply never scheduled because Ceph always picked some other scrub to do first on the relevant OSD. Increasing `osd_max_scrubs` beyond 1 made it possible to force the scrub to start. I conclude that most of the information online, including the Ceph docs, does not give the correct advice when recommending `ceph pg repair`. Instead, the docs should make clear that a scrub will fix such issues without involvement of `ceph pg repair`. I find lack of docs disturbing, because a disk failing and being replaced is an extremely common operation for storage cluster. Including some relevant logs of the scrub recovery: # grep '\b2\.87\b' /var/log/ceph/ceph-osd.33.log | grep deep 2023-05-16T16:33:58.398+0000 7f9a985e5640 0 log_channel(cluster) log [DBG] : 2.87 deep-scrub ok 2023-06-16T20:03:26.923+0000 7f9a985e5640 -1 log_channel(cluster) log [ERR] : 2.87 deep-scrub 0 missing, 1 inconsistent objects 2023-06-16T20:03:26.923+0000 7f9a985e5640 -1 log_channel(cluster) log [ERR] : 2.87 deep-scrub 1 errors 2023-06-26T05:06:17.412+0000 7f9b15bfe640 0 log_channel(cluster) log [INF] : osd.33 pg 2.87 Deep scrub errors, upgrading scrub to deep-scrub 2023-06-29T10:14:07.791+0000 7f9a985e5640 0 log_channel(cluster) log [DBG] : 2.87 deep-scrub ok ceph.log: 2023-06-29T10:14:07.792432+0000 osd.33 (osd.33) 938 : cluster [DBG] 2.87 deep-scrub ok 2023-06-29T10:14:09.311257+0000 mgr.node-5 (mgr.2454216) 385434 : cluster [DBG] pgmap v385836: 832 pgs: 1 active+clean+scrubbing, 17 active+clean+scrubbing+deep, 814 active+clean; 68 TiB data, 210 TiB used, 229 TiB / 439 TiB avail; 80 MiB/s rd, 40 MiB/s wr, 45 op/s 2023-06-29T10:14:09.427733+0000 mon.node-4 (mon.0) 20923054 : cluster [INF] Health check cleared: OSD_SCRUB_ERRORS (was: 1 scrub errors) 2023-06-29T10:14:09.427758+0000 mon.node-4 (mon.0) 20923055 : cluster [INF] Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg inconsistent) 2023-06-29T10:14:09.427786+0000 mon.node-4 (mon.0) 20923056 : cluster [INF] Cluster is now healthy From this, it seems bad that Ceph did not manage to schedule the cluster-fixing scrub within 7 days of the faulty disk being replaced, nor managed to schedule a human-requested scrub within 2 days. What mechanism in Ceph decides the scheduling of scrubs? I see the config value `osd_requested_scrub_priority` which is for "the priority set for user requested scrub on the work queue", but I cannot tell if this also affects scrub start scheduling, or only the priority of IO operations vs e.g. client operations once a scrub has already been started. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx