Hi, since some time (I think upgrade to nautilus) we get X pgs not deep scrubbed in time I deep-scrubbed the pgs when the error occurred and expected the cluster to recover over time, but no such luck. The warning comes up again and again. In our spinning rust cluster we allow deep scrubbing only from 19:00 to 06:00 and changed the deep scrub interval to 28 days (detailed osd scrub config below). Looking at the pgs which are supposedly not deep scrubbed in time, reveals that their 28 day period is not over yet, example: # date && ceph health detail | awk '$1=="pg" { print $0 }' Di 11. Aug 09:32:17 CEST 2020 pg 0.787 not deep-scrubbed since 2020-07-30 03:08:24.899264 pg 0.70c not deep-scrubbed since 2020-07-30 02:45:08.989329 pg 0.6c1 not deep-scrubbed since 2020-07-30 03:01:15.199496 pg 13.3 not deep-scrubbed since 2020-07-30 03:29:54.536825 pg 0.d9 not deep-scrubbed since 2020-07-30 03:12:34.503586 pg 0.41a not deep-scrubbed since 2020-07-30 03:01:23.514582 pg 0.490 not deep-scrubbed since 2020-07-30 03:05:45.616100 I wonder if I have missed or messed up some parameter that could cause this. # ceph daemon osd.40 config show | grep scrub "mds_max_scrub_ops_in_progress": "5", "mon_scrub_inject_crc_mismatch": "0.000000", "mon_scrub_inject_missing_keys": "0.000000", "mon_scrub_interval": "86400", "mon_scrub_max_keys": "100", "mon_scrub_timeout": "300", "mon_warn_pg_not_deep_scrubbed_ratio": "0.750000", "mon_warn_pg_not_scrubbed_ratio": "0.500000", "osd_debug_deep_scrub_sleep": "0.000000", "osd_deep_scrub_interval": "2419200.000000", "osd_deep_scrub_keys": "1024", "osd_deep_scrub_large_omap_object_key_threshold": "200000", "osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824", "osd_deep_scrub_randomize_ratio": "0.150000", "osd_deep_scrub_stride": "1048576", "osd_deep_scrub_update_digest_min_age": "7200", "osd_max_scrubs": "1", "osd_op_queue_mclock_scrub_lim": "0.001000", "osd_op_queue_mclock_scrub_res": "0.000000", "osd_op_queue_mclock_scrub_wgt": "1.000000", "osd_requested_scrub_priority": "120", "osd_scrub_auto_repair": "false", "osd_scrub_auto_repair_num_errors": "5", "osd_scrub_backoff_ratio": "0.660000", "osd_scrub_begin_hour": "19", "osd_scrub_begin_week_day": "0", "osd_scrub_chunk_max": "25", "osd_scrub_chunk_min": "5", "osd_scrub_cost": "52428800", "osd_scrub_during_recovery": "false", "osd_scrub_end_hour": "6", "osd_scrub_end_week_day": "7", "osd_scrub_interval_randomize_ratio": "0.500000", "osd_scrub_invalid_stats": "true", "osd_scrub_load_threshold": "0.500000", "osd_scrub_max_interval": "604800.000000", "osd_scrub_max_preemptions": "5", "osd_scrub_min_interval": "172800.000000", "osd_scrub_priority": "5", "osd_scrub_sleep": "0.100000", ceph version 14.2.10 My ad hoc helper script #!/bin/bash # gently deep scrub all pgs which failed to deep scrub in set period ceph health detail | awk '$1=="pg" { print $2 } ' | while read -r pg do ceph pg deep-scrub "$pg" sleep 60 done Cheers, Dirk _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx