Re: Annoying PGs not deep-scrubbed in time messages in Nautilus.

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Mon, 9 Dec 2019 12:51:07 -0800

On Mon, Dec 9, 2019 at 11:58 AM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:
solved it: the warning is of course generated by ceph-mgr and not ceph-mon.

So for my problem that means: should have injected the option in ceph-mgr. That's why it obviously worked when setting it on the pool...

The solution for you is to simply put the option under global and restart ceph-mgr (or use daemon config set; it doesn't support changing config via ceph tell for some reason)

Paul

On Mon, Dec 9, 2019 at 8:32 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:

On Mon, Dec 9, 2019 at 5:17 PM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
I've increased the deep_scrub interval on the OSDs on our Nautilus cluster with the following added to the [osd] section:

should have read the beginning of your email; you'll need to set the option on the mons as well because they generate the warning. So your problem might be completely different from what I'm seeing here

Paul

osd_deep_scrub_interval = 2600000

And I started seeing

1518 pgs not deep-scrubbed in time

in ceph -s. So I added

mon_warn_pg_not_deep_scrubbed_ratio = 1

since the default would start warning with a whole week left to scrub. But the messages persist. The cluster has been running for a month with these settings. Here is an example of the output. As you can see, some of these are not even two weeks old, no where close to the 75% of 4 weeks.

    pg 6.1f49 not deep-scrubbed since 2019-11-09 23:04:55.370373

    pg 6.1f47 not deep-scrubbed since 2019-11-18 16:10:52.561204

    pg 6.1f44 not deep-scrubbed since 2019-11-18 15:48:16.825569

    pg 6.1f36 not deep-scrubbed since 2019-11-20 05:39:00.309340

    pg 6.1f31 not deep-scrubbed since 2019-11-27 02:48:45.347680

    pg 6.1f30 not deep-scrubbed since 2019-11-11 21:34:15.795622

    pg 6.1f2d not deep-scrubbed since 2019-11-24 11:37:39.502829

    pg 6.1f27 not deep-scrubbed since 2019-11-25 07:38:58.689315

    pg 6.1f25 not deep-scrubbed since 2019-11-20 00:13:43.048569

    pg 6.1f1a not deep-scrubbed since 2019-11-09 15:08:43.516666

    pg 6.1f19 not deep-scrubbed since 2019-11-25 10:24:47.884332

    1468 more pgs...

Mon Dec  9 08:12:01 PST 2019

There is very little data on the cluster, so it's not a problem of deep-scrubs taking too long:

$ ceph df

RAW STORAGE:

    CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED  
    hdd       6.3 PiB     6.1 PiB     153 TiB      154 TiB          2.39  
    nvme      5.8 TiB     5.6 TiB     138 GiB      197 GiB          3.33  
    TOTAL     6.3 PiB     6.2 PiB     154 TiB      154 TiB          2.39  

POOLS:

    POOL                           ID     STORED      OBJECTS     USED        %USED     MAX AVAIL  
    .rgw.root                       1     3.0 KiB           7     3.0 KiB         0       1.8 PiB  
    default.rgw.control             2         0 B           8         0 B         0       1.8 PiB  
    default.rgw.meta                3     7.4 KiB          24     7.4 KiB         0       1.8 PiB  
    default.rgw.log                 4      11 GiB         341      11 GiB         0       1.8 PiB  
    default.rgw.buckets.data        6     100 TiB      41.84M     100 TiB      1.82       4.2 PiB  
    default.rgw.buckets.index       7      33 GiB         574      33 GiB         0       1.8 PiB  
    default.rgw.buckets.non-ec      8     8.1 MiB          22     8.1 MiB         0       1.8 PiB

Please help me figure out what I'm doing wrong with these settings.

Paul,

Thanks, I did set both options to the global on the mons and restarted them, but that didn't help. Having the scrub interval set in the global section and restarting the mgr fixed it.

----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com