Re: deep-scrubs not respecting scrub interval (ceph luminous)

"Anthony D'Atri" <aad@xxxxxxxxxxxxxx> · Sun, 24 Oct 2021 10:40:55 -0700

This page might also help:

https://docs.ceph.com/en/pacific/dev/osd_internals/scrub/

> osd_scrub_begin_hour = 10 <= this works, great
> osd_scrub_end_hour = 17 <= this works, great

Does your workload vary that much over the course of a day?  This limits scrubs to ~29% of the day, so during those hours you’re likely to have ~~ 3x the number of scrubs [trying to] run than there otherwise would be.  When I’ve had to run HDD clusters, with colocated journal/WAL+DB, I’ve had to bump up the deep scrub interval to 4 weeks *even without limiting hours*.  Which when you consider *why* we run scrubs, is far from optimal.  I’ll assume that you’re using SAS/SATA drives, probably HDD.  Check your HBA to see if it’s running “patrol read” or similar activities, and if you have smartd running an intensive scan.  It’s possible to be in a situation where Ceph, the HBA, and smartd are competing to run overlapping (and partly redundant, if not strictly identical) tests.  Which will impact everything, including client performance.

> osd_scrub_interval_randomize_ratio - Ratio of scrub interval to randomly vary
>  Default: 0.500000
> 
> This prevents a scrub 'stampede' by randomly varying the scrub intervals so that they are soon uniformly distributed over the week

> osd_deep_scrub_randomize_ratio - Scrubs will randomly become deep scrubs at this rate (0.15 -> 15% of scrubs are deep)
>  Default: 0.150000
>  Can update at runtime: true
> 
> This prevents a deep scrub 'stampede' by spreading deep scrubs so they are uniformly distributed over the week
> 
> So “osd_deep_scrub_interval” only means deep scrubbing each PG **at least** once per 2419200s

Exactly.

A few years ago I remember a discussion where it was claimed that osd_scrub_interval_randomize_ratio wasn’t doing quite what one might expect, and that in fact it could sometimes result in up to double the configured interval between certain scrubs.  Since recent versions of Ceph warn about overdue scrubs, I suspect this has since improved.

Some contest for the randomize ratio — with previous versions of Ceph, scrubs tended to be *very* non-uniformly distributed, influenced by many OSDs having been created at the same time, host downtimes, long periods of no*scrub being set during maintenance, etc.  Collisions due to osd_max_scrubs would *slowly* smooth out the clumps.  It was not unheard of for operators to disable automatic scrubs entirely so that we could issue them at a steady pace.  There were also scripts/daemons written that would adaptively issue scrubs based on the number outstanding, op latency, drive utilization, etc.  The randomize ratio wasn’t a pefect solution to this problem, but it did debulk the problem considerably with minimal complexity.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx