Re: Annoying PGs not deep-scrubbed in time messages in Nautilus.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

nice coincidence that you mention that today; I've just debugged the exact same problem on a setup where deep_scrub_interval was increased.

The solution was to set the deep_scrub_interval directly on all pools instead (which was better for this particular setup anyways):

ceph osd pool set <pool> deep_scrub_interval <deep_scrub_in_seconds>

Here's the code that generates the warning: https://github.com/ceph/ceph/blob/v14.2.4/src/mon/PGMap.cc#L3058

* There's no obvious bug in the code, no reason why it shouldn't work with the option unless "pool->opts.get(pool_opts_t::DEEP_SCRUB_INTERVAL, x)" returns the wrong thing if it's not configured for a pool
* I've used "config diff" to check that all mons use the correct value for deep_scrub_interval
* mon_warn_pg_not_deep_scrubbed_ratio is a little bit odd because the warning will trigger at (mon_warn_pg_not_deep_scrubbed_ratio + 1) * deep_scrub_interval which is somewhat unexpected, so by default at 125% the configured interval



Paul

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Mon, Dec 9, 2019 at 5:17 PM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
I've increased the deep_scrub interval on the OSDs on our Nautilus cluster with the following added to the [osd] section:

osd_deep_scrub_interval = 2600000

And I started seeing

1518 pgs not deep-scrubbed in time

in ceph -s. So I added

mon_warn_pg_not_deep_scrubbed_ratio = 1

since the default would start warning with a whole week left to scrub. But the messages persist. The cluster has been running for a month with these settings. Here is an example of the output. As you can see, some of these are not even two weeks old, no where close to the 75% of 4 weeks.

    pg 6.1f49 not deep-scrubbed since 2019-11-09 23:04:55.370373
   pg 6.1f47 not deep-scrubbed since 2019-11-18 16:10:52.561204
   pg 6.1f44 not deep-scrubbed since 2019-11-18 15:48:16.825569
   pg 6.1f36 not deep-scrubbed since 2019-11-20 05:39:00.309340
   pg 6.1f31 not deep-scrubbed since 2019-11-27 02:48:45.347680
   pg 6.1f30 not deep-scrubbed since 2019-11-11 21:34:15.795622
   pg 6.1f2d not deep-scrubbed since 2019-11-24 11:37:39.502829
   pg 6.1f27 not deep-scrubbed since 2019-11-25 07:38:58.689315
   pg 6.1f25 not deep-scrubbed since 2019-11-20 00:13:43.048569
   pg 6.1f1a not deep-scrubbed since 2019-11-09 15:08:43.516666
   pg 6.1f19 not deep-scrubbed since 2019-11-25 10:24:47.884332
   1468 more pgs...
Mon Dec  9 08:12:01 PST 2019

There is very little data on the cluster, so it's not a problem of deep-scrubs taking too long:

$ ceph df
RAW STORAGE:
   CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED  
   hdd       6.3 PiB     6.1 PiB     153 TiB      154 TiB          2.39  
   nvme      5.8 TiB     5.6 TiB     138 GiB      197 GiB          3.33  
   TOTAL     6.3 PiB     6.2 PiB     154 TiB      154 TiB          2.39  
 
POOLS:
   POOL                           ID     STORED      OBJECTS     USED        %USED     MAX AVAIL  
   .rgw.root                       1     3.0 KiB           7     3.0 KiB         0       1.8 PiB  
   default.rgw.control             2         0 B           8         0 B         0       1.8 PiB  
   default.rgw.meta                3     7.4 KiB          24     7.4 KiB         0       1.8 PiB  
   default.rgw.log                 4      11 GiB         341      11 GiB         0       1.8 PiB  
   default.rgw.buckets.data        6     100 TiB      41.84M     100 TiB      1.82       4.2 PiB  
   default.rgw.buckets.index       7      33 GiB         574      33 GiB         0       1.8 PiB  
   default.rgw.buckets.non-ec      8     8.1 MiB          22     8.1 MiB         0       1.8 PiB

Please help me figure out what I'm doing wrong with these settings.

Thanks,
Robert LeBlanc
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux