Re: OSD scrub during recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Is it only preventing scrubs on the OSD's that are actively recovering/backfilling?"

That's exactly what it's doing.  Notice that none of your PGs listed as scrubbing have undersized, degraded, backfill, backfilling, etc in the PG status.  They are all "active+clean+scrubbing+deep".  I don't see any reason why the cluster would not schedule scrubs for PGs that are active+clean while other PGs in the cluster are backfilling, recovering, degraded, etc as long as the OSDs involved with the scrub are also not involved with recovery IO.

On Tue, May 30, 2017 at 11:45 AM Reed Dier <reed.dier@xxxxxxxxxxx> wrote:
Lost an OSD and having to rebuild it.

8TB drive, so it has to backfill a ton of data.
Been taking a while, so looked at ceph -s and noticed that deep/scrubs were running even though I’m running newest Jewel (10.2.7) and OSD’s have the osd_scrub_during_recovery set to false.

$ cat /etc/ceph/ceph.conf | grep scrub | grep recovery
osd_scrub_during_recovery = false

$ sudo ceph daemon osd.0 config show | grep scrub | grep recovery
    "osd_scrub_during_recovery": "false”,

$ ceph --version
ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)

    cluster edeb727e-c6d3-4347-bfbb-b9ce7f60514b
     health HEALTH_WARN
            133 pgs backfill_wait
            10 pgs backfilling
            143 pgs degraded
            143 pgs stuck degraded
            143 pgs stuck unclean
            143 pgs stuck undersized
            143 pgs undersized
            recovery 22081436/1672287847 objects degraded (1.320%)
            recovery 20054800/1672287847 objects misplaced (1.199%)
            noout flag(s) set
            election epoch 4234, quorum 0,1,2 core,dev,db
      fsmap e5013: 1/1/1 up {0=core=up:active}, 1 up:standby
     osdmap e27892: 54 osds: 54 up, 54 in; 143 remapped pgs
            flags noout,nodeep-scrub,sortbitwise,require_jewel_osds
      pgmap v13840713: 4292 pgs, 6 pools, 59004 GB data, 564 Mobjects
            159 TB used, 69000 GB / 226 TB avail
            22081436/1672287847 objects degraded (1.320%)
            20054800/1672287847 objects misplaced (1.199%)
                4143 active+clean
                 133 active+undersized+degraded+remapped+wait_backfill
                  10 active+undersized+degraded+remapped+backfilling
                   6 active+clean+scrubbing+deep
recovery io 21855 kB/s, 346 objects/s
  client io 30021 kB/s rd, 1275 kB/s wr, 291 op/s rd, 62 op/s wr

Looking at the ceph documentation for ‘master'

osd scrub during recovery

Description: Allow scrub during recovery. Setting this to false will disable scheduling new scrub (and deep–scrub) while there is active recovery. Already running scrubs will be continued. This might be useful to reduce load on busy clusters.
Type: Boolean
Default: true

Are backfills not treated as recovery operations? Is it only preventing scrubs on the OSD’s that are actively recovering/backfilling?

Just curious as to why the feature did not seem to kick in as expected.

Thanks,

Reed
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux