Thanks,
This makes sense, but just wanted to sanity check my assumption against reality.
In my specific case, 24 of the OSD’s are HDD, 30 SSD in different roots/pools, and so deep scrubs on the other 23 spinning disks could in theory eat iops on a disk currently backfilling to the other OSD.
Either way, make sense, and thanks for the insight.
And don’t worry Wido, they aren’t SMR drives!
Thanks,
Reed
On May 30, 2017, at 11:03 AM, Wido den Hollander < wido@xxxxxxxx> wrote:
Op 30 mei 2017 om 17:37 schreef Reed Dier <reed.dier@xxxxxxxxxxx>:
Lost an OSD and having to rebuild it.
8TB drive, so it has to backfill a ton of data. Been taking a while, so looked at ceph -s and noticed that deep/scrubs were running even though I’m running newest Jewel (10.2.7) and OSD’s have the osd_scrub_during_recovery set to false.
$ cat /etc/ceph/ceph.conf | grep scrub | grep recovery osd_scrub_during_recovery = false
$ sudo ceph daemon osd.0 config show | grep scrub | grep recovery "osd_scrub_during_recovery": "false”,
$ ceph --version ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
cluster edeb727e-c6d3-4347-bfbb-b9ce7f60514b health HEALTH_WARN 133 pgs backfill_wait 10 pgs backfilling 143 pgs degraded 143 pgs stuck degraded 143 pgs stuck unclean 143 pgs stuck undersized 143 pgs undersized recovery 22081436/1672287847 objects degraded (1.320%) recovery 20054800/1672287847 objects misplaced (1.199%) noout flag(s) set monmap e1: 3 mons at {core=10.0.1.249:6789/0,db=10.0.1.251:6789/0,dev=10.0.1.250:6789/0} election epoch 4234, quorum 0,1,2 core,dev,db fsmap e5013: 1/1/1 up {0=core=up:active}, 1 up:standby osdmap e27892: 54 osds: 54 up, 54 in; 143 remapped pgs flags noout,nodeep-scrub,sortbitwise,require_jewel_osds pgmap v13840713: 4292 pgs, 6 pools, 59004 GB data, 564 Mobjects 159 TB used, 69000 GB / 226 TB avail 22081436/1672287847 objects degraded (1.320%) 20054800/1672287847 objects misplaced (1.199%) 4143 active+clean 133 active+undersized+degraded+remapped+wait_backfill 10 active+undersized+degraded+remapped+backfilling 6 active+clean+scrubbing+deep recovery io 21855 kB/s, 346 objects/s client io 30021 kB/s rd, 1275 kB/s wr, 291 op/s rd, 62 op/s wr
Looking at the ceph documentation for ‘master'
osd scrub during recovery
Description: Allow scrub during recovery. Setting this to false will disable scheduling new scrub (and deep–scrub) while there is active recovery. Already running scrubs will be continued. This might be useful to reduce load on busy clusters. Type: Boolean Default: true
Are backfills not treated as recovery operations? Is it only preventing scrubs on the OSD’s that are actively recovering/backfilling?
Just curious as to why the feature did not seem to kick in as expected.
It is per OSD. So only on that OSD new (deep-)scrubs will not be started as long as a recovery/backfill operation is active there.So other OSDs which have nothing to do with it will still perform scrubs.Wido Thanks,
Reed_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|