Hi all,

I replaced a disk in our octopus cluster and it is rebuilding. I noticed that since the replacement there is no scrubbing going on. Apparently, an OSD having a PG in backfill_wait state seems to block deep scrubbing all other PGs on that OSD as well - at least this is how it looks.

Some numbers: the pool in question has 8192 PGs with EC 8+3 and ca 850 OSDs. A total of 144 PGs needed backfilling (were remapped after replacing the disk). After about 2 days we are down to 115 backfill_wait + 3 backfilling. It will take a bit more than a week to complete.

There is plenty of time and IOP/s available to deep-scrub PGs on the side, but since the backfill started there is zero scrubbing/deep scrubbing going on and "PGs not deep scrubbed in time" messages are piling up.

Is there a way to allow (deep) scrub in this situation?

