Re: Recovery/Backfill Speedup

Ronny Aasen <ronny+ceph-users@xxxxxxxx> · Thu, 6 Oct 2016 14:04:35 +0200

how did you set the parameter ?
editing ceph.conf only works when you restart the osd nodes.

but running something like
ceph tell osd.*  injectargs '--osd-max-backfills 6'

would set all osd's max backfill dynamically without restarting the osd. 
and you should fairly quickly afterwards see more backfills in ceph -s

I have also noticed that if i run
ceph -n osd.0 --show-config

on one of my mon nodes, it shows the deafult settings. it does not 
actualy talk to osd.0 and get the current settings. but if i run it from 
any osd node it works. But i am on hammer and not on jewel so this might 
have changed and actualy work for you.

Kind regards
Ronny Aasen

On 05. okt. 2016 21:52, Dan Jakubiec wrote:
Thank Ronny, I am working with Reed on this problem.

Yes something is very strange.  Docs say osd_max_backfills default to
10, but when we examined the run-time configuration using "ceph
--show-config" it was showing osd_max_backfills set to 1 (we are running
latest Jewel release).

We have explicitly set this parameter to 10 now.   Sadly, about 2 hours
in backfills continue to be anemic.   Any other ideas?

$ ceph -s
    cluster edeb727e-c6d3-4347-bfbb-b9ce7f60514b
     health HEALTH_WARN
            246 pgs backfill_wait
            3 pgs backfilling
            329 pgs degraded
            83 pgs recovery_wait
            332 pgs stuck unclean
            257 pgs undersized
            recovery 154681996/676556815 objects degraded (22.863%)
            recovery 278768286/676556815 objects misplaced (41.204%)
            noscrub,nodeep-scrub,sortbitwise flag(s) set
     monmap e1: 3 mons at
{core=10.0.1.249:6789/0,db=10.0.1.251:6789/0,dev=10.0.1.250:6789/0}
            election epoch 210, quorum 0,1,2 core,dev,db
     osdmap e4274: 16 osds: 16 up, 16 in; 279 remapped pgs
            flags noscrub,nodeep-scrub,sortbitwise
      pgmap v1657039: 576 pgs, 2 pools, 6427 GB data, 292 Mobjects
            15308 GB used, 101 TB / 116 TB avail
            154681996/676556815 objects degraded (22.863%)
            278768286/676556815 objects misplaced (41.204%)
                 244 active+clean
                 242 active+undersized+degraded+remapped+wait_backfill
                  53 active+recovery_wait+degraded
                  17 active+recovery_wait+degraded+remapped
                  13 active+recovery_wait+undersized+degraded+remapped
                   3 active+remapped+wait_backfill
                   2 active+undersized+degraded+remapped+backfilling
                   1 active+degraded+remapped+wait_backfill
                   1 active+degraded+remapped+backfilling
recovery io 1568 kB/s, 109 objects/s
  client io 5629 kB/s rd, 411 op/s rd, 0 op/s wr

Here is what our current configuration looks like:

$ ceph -n osd.0 --show-config | grep osd | egrep "recovery|backfill" | sort
osd_allow_recovery_below_min_size = true
osd_backfill_full_ratio = 0.85
osd_backfill_retry_interval = 10
osd_backfill_scan_max = 512
osd_backfill_scan_min = 64
osd_debug_reject_backfill_probability = 0
osd_debug_skip_full_check_in_backfill_reservation = false
osd_kill_backfill_at = 0
osd_max_backfills = 10
osd_min_recovery_priority = 0
osd_recovery_delay_start = 0
osd_recovery_forget_lost_objects = false
osd_recovery_max_active = 15
osd_recovery_max_chunk = 8388608
osd_recovery_max_single_start = 1
osd_recovery_op_priority = 63
osd_recovery_op_warn_multiple = 16
osd_recovery_sleep = 0
osd_recovery_thread_suicide_timeout = 300
osd_recovery_thread_timeout = 30
osd_recovery_threads = 5

-- Dan

Ronny Aasen wrote:
On 04.10.2016 16:31, Reed Dier wrote:
Attempting to expand our small ceph cluster currently.

Have 8 nodes, 3 mons, and went from a single 8TB disk per node to 2x
8TB disks per node, and the rebalancing process is excruciatingly slow.

Originally at 576 PGs before expansion, and wanted to allow rebalance
to finish before expanding the PG count for the single pool, and the
replication size.

I have stopped scrubs for the time being, as well as set client and
recovery io to equal parts so that client io is not burying the
recovery io. Also have increased the number of recovery threads per osd.

[osd]
osd_recovery_threads = 5
filestore_max_sync_interval = 30
osd_client_op_priority = 32
osd_recovery_op_priority = 32
Also, this is 10G networking we are working with and recovery io
typically hovers between 0-35 MB’s but typically very bursty.
Disks are 8TB 7.2k SAS disks behind an LSI 3108 controller,
configured as individual RAID0 VD’s, with pdcache disabled, but BBU
backed write back caching enabled at the controller level.

Have thought about increasing the ‘osd_max_backfills’ as well as
‘osd_recovery_max_active’, and possibly ‘osd_recovery_max_chunk’ to
attempt to speed it up, but will hopefully get some insight from the
community here.

ceph -s about 4 days in:

      health HEALTH_WARN
             255 pgs backfill_wait
             4 pgs backfilling
             385 pgs degraded
             129 pgs recovery_wait
             388 pgs stuck unclean
             274 pgs undersized
             recovery 165319973/681597074 objects degraded (24.255%)
             recovery 298607229/681597074 objects misplaced (43.810%)
             noscrub,nodeep-scrub,sortbitwise flag(s) set
      monmap e1: 3 mons at
{core=10.0.1.249:6789/0,db=10.0.1.251:6789/0,dev=10.0.1.250:6789/0}
             election epoch 190, quorum 0,1,2 core,dev,db
      osdmap e4226: 16 osds: 16 up, 16 in; 303 remapped pgs
             flags noscrub,nodeep-scrub,sortbitwise
       pgmap v1583742: 576 pgs, 2 pools, 6426 GB data, 292 Mobjects
             15301 GB used, 101 TB / 116 TB avail
             165319973/681597074 objects degraded (24.255%)
             298607229/681597074 objects misplaced (43.810%)
                  249 active+undersized+degraded+remapped+wait_backfill
                  188 active+clean
                   85 active+recovery_wait+degraded
                   22 active+recovery_wait+degraded+remapped
                   22 active+recovery_wait+undersized+degraded+remapped
                    3 active+remapped+wait_backfill
                    3 active+undersized+degraded+remapped+backfilling
                    3 active+degraded+remapped+wait_backfill
                    1 active+degraded+remapped+backfilling
recovery io 9361 kB/s, 415 objects/s
   client io 597 kB/s rd, 62 op/s rd, 0 op/s wr
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

4 pgs backfilling

this sounds incredibly low for your configuration. you do not say
anything about The default is 10. so with 8 nodes each having 1 osd
writing and 1 osd reading you should see much more then 4 pgs
backfilling at any given time. theoretical max beeing 8*10 = 80

check what your current max backfill value is. and try setting
osd-max-backfill higher, preferable in smaller increments while
monitoring how many pg's are backfilling and the load on machines and
network.

kind regards
Ronny Aasen

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Dan Jakubiec
VP Development
Focus VQ LLC
<https://www.postbox-inc.com/?utm_source=email&utm_medium=siglink&utm_campaign=reach>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com