Re: pg deep scrubbing issue

Jeffrey Turmelle <jefft@xxxxxxxxxxxxxxxx> · Tue, 3 Jan 2023 10:14:58 -0500

Thank you Anthony.  I did have an empty pool that I had provisioned for developers that was never used.  I’ve removed that pool and the 0 object PGs are gone.  I don’t know why I didn’t realize that.  Removing that pool halved the # of PGs not scrubbed in time.

This is entirely an HDD cluster.  I don’t constrain my scrubs, and I had already set the osd_deep_scrub_interval to 2 weeks, and increased the osd_scrub_load_threshold to 5.  But that didn’t help much.

I’ve moved our operations to our failover cluster so hopefully this one can catch up now.  I don’t understand how this started out of the blue, but at least now, the number is decreasing.

Jeff

> On Jan 3, 2023, at 12:57 AM, Anthony D'Atri <anthony.datri@xxxxxxxxx> wrote:
> 
> Look closely at your output. The PGs with 0 objects. Are only “every other” due to how the command happened to order the output.
> 
> Note that the empty PGs all have IDs matching “3.*”. The numeric prefix of a PG ID reflects the cardinal ID of the pool to which it belongs.   I strongly suspect that you have a pool with no data.
> 
> 
> 
>>> Strangely, ceph pg dump gives shows every other PG with 0 objects.  An attempt to perform a deep scrub (or scrub) on one of these PGs does nothing.   The cluster appears to be running fine, but obviously there’s an issue.   What should my next steps be to troubleshoot ?
>>>> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES        OMAP_BYTES* OMAP_KEYS* LOG  DISK_LOG STATE                       STATE_STAMP                VERSION       REPORTED       UP            UP_PRIMARY ACTING        ACTING_PRIMARY LAST_SCRUB    SCRUB_STAMP                LAST_DEEP_SCRUB DEEP_SCRUB_STAMP           SNAPTRIMQ_LEN
>>>> 3.e9b         0                  0        0         0       0            0           0          0    0        0                active+clean 2022-12-31 22:49:07.629579           0'0    23686:19820       [28,79]         28       [28,79]             28           0'0 2022-12-31 22:49:07.629508             0'0 2022-12-31 22:49:07.629508             0
>>>> 1.e99     60594                  0        0         0       0 177433523272           0          0 3046     3046                active+clean 2022-12-21 14:35:08.175858  23686'268137  23686:1732399     [178,115]        178     [178,115]            178  23675'267613 2022-12-21 11:01:10.403525    23675'267613 2022-12-21 11:01:10.403525             0
>>>> 3.e9a         0                  0        0         0       0            0           0          0    0        0                active+clean 2022-12-31 09:16:48.644619           0'0    23686:22855      [51,140]         51      [51,140]             51           0'0 2022-12-31 09:16:48.644568             0'0 2022-12-30 02:35:23.367344             0
>>>> 1.e98     59962                  0        0         0       0 177218669411           0          0 3035     3035                active+clean 2022-12-28 14:14:49.908560  23686'265576  23686:1357499       [92,86]         92       [92,86]             92  23686'265445 2022-12-28 14:14:49.908522    23686'265445 2022-12-28 14:14:49.908522             0
>>>> 3.e95         0                  0        0         0       0            0           0          0    0        0                active+clean 2022-12-31 06:09:39.442932           0'0    23686:22757       [48,83]         48       [48,83]             48           0'0 2022-12-31 06:09:39.442879             0'0 2022-12-18 09:33:47.892142             0
> 
> 
> As to your PGs not scrubbed in time, what sort of hardware are your OSDs?  Here are some thoughts, especially if they’re HDDs.
> 
> * If you don’t need that empty pool, delete it, then evaluate how many PGs on average your OSDs  hold (eg. `ceph osd df`).  If you have an unusually high number of PGs per, maybe just maybe you’re running afoul of osd_scrub_extended_sleep / osd_scrub_sleep .  In other words, individual scrubs on empty PGs may naturally be very fast, but they may be DoSing because of the efforts Ceph makes to spread out the impact of scrubs.
> 
> * Do you limit scrubs to certain times via osd_scrub_begin_hour, osd_scrub_end_hour, osd_scrub_begin_week_day, osd_scrub_end_week_day?  I’ve seen operators who constraint scrubs to only a few overnight / weekend hours, but doing so can hobble Ceph’s ability to get through them all in time.
> 
> * Similarly, a value of osd_scrub_load_threshold that’s too low can also result in starvation.  The load average statistic can be misleading on modern SMP systems with lots of cores.  I’ve witnessed 32c/64t OSD nodes report a load average of like 40, but with tools like htop one could see that they were barely breaking a sweat.
> 
> * If you have osd_scrub_during_recovery disabled and experience a lot of backfill / recovery / rebalance traffic, that can starve scrubs too.  IMHO with recent releases this should almost always be enabled, ymmv.
> 
> * Back when I ran busy (read: underspend) HDD clusters I had to bump osd_deep_scrub_interval by a factor of 4x due to how slow and seek-bound the LFF spinners were.  Of course, the longer one spaces out scrubs, the less effective they are at detecting problems before they’re impactful.
> 
> 
> 
> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx