Re: [Ceph] Recovery is very Slow

Lokendra Rathour <lokendrarathour@xxxxxxxxx> · Fri, 29 Oct 2021 10:37:42 +0530

Hi Vladimir,i have reconfigured the setup to 15 OSD Now,
Every 1.0s: sudo ceph -s                                                                          Fri Oct 29 10:21:07 2021

  cluster:
    id:     1a8bfc8a-ad9d-4a06-9963-5e84e7ce80ee
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum storagenode1,storagenode2,storagenode3 (age 2h)
    mgr: storagenode3(active, since 16h), standbys: storagenode2, storagenode1
    mds: cephfs:1 {0=storagenode3=up:active} 2 up:standby
    osd: 15 osds: 15 up (since 5m), 15 in (since 16h); 19 remapped pgs
    rgw: 3 daemons active (storagenode1.rgw0, storagenode2.rgw0, storagenode3.rgw0)

  task status:
    scrub status:
        mds.storagenode3: idle

  data:
    pools:   7 pools, 265 pgs
    objects: 4.13M objects, 1.9 TiB
    usage:   6.1 TiB used, 7.0 TiB / 13 TiB avail
    pgs:     662670/12381873 objects misplaced (5.352%)
             246 active+clean
             19  active+remapped+backfilling

  io:
    recovery: 114 MiB/s, 173 objects/s

I see the recovery as around 140 MiB/s so is this per OSD or this is in total, from the message you have sent i could see that it is per OSD. Also from the command "ceph tell 'osd.*' injectargs --osd-max-backfills=2 --osd-recovery-max-active=6" i do not see much visible difference. Is this that we have to restart the OSD service as because after running this command I see as :
[ansible@storagenode1 ~]$ sudo ceph tell 'osd.*' injectargs --osd-max-backfills=8 --osd-recovery-max-active=12
osd.0: osd_recovery_max_active = '12' (not observed, change may require restart)
osd.0: {}
osd.1: osd_recovery_max_active = '12' (not observed, change may require restart)
osd.1: {}
osd.2: osd_recovery_max_active = '12' (not observed, change may require restart)
osd.2: {}
osd.3: osd_recovery_max_active = '12' (not observed, change may require restart)
osd.3: {}
osd.4: osd_recovery_max_active = '12' (not observed, change may require restart)
osd.4: {}
osd.5: osd_recovery_max_active = '12' (not observed, change may require restart)
osd.5: {}
osd.6: osd_recovery_max_active = '12' (not observed, change may require restart)
osd.6: {}
osd.7: osd_recovery_max_active = '12' (not observed, change may require restart)
osd.7: {}
osd.8: osd_recovery_max_active = '12' (not observed, change may require restart)
osd.8: {}
osd.9: osd_recovery_max_active = '12' (not observed, change may require restart)
osd.9: {}
osd.10: osd_recovery_max_active = '12' (not observed, change may require restart)
osd.10: {}
osd.11: osd_recovery_max_active = '12' (not observed, change may require restart)
osd.11: {}
osd.12: osd_recovery_max_active = '12' (not observed, change may require restart)
osd.12: {}
osd.13: osd_recovery_max_active = '12' (not observed, change may require restart)
osd.13: {}
osd.14: osd_recovery_max_active = '12' (not observed, change may require restart)
osd.14: {}
it says change may require restart, but even after restart no impact w.r.t to recovery  rate change. 

thanks,
Lokendra

On Thu, Oct 28, 2021 at 1:53 PM Vladimir Bashkirtsev <vladimir@xxxxxxxxxxxxxxx> wrote:
1. You can do:
ceph tell 'osd.*' injectargs --osd-max-backfills=2 --osd-recovery-max-active=6
This will change these settings on the fly but they will be reset on OSD restart (each OSD will get it and will remeber until its own restart - you may have OSDs running with different settings).
2. Nothing to do with threads: it is scenario which I have covered in my previous response. If you have more than 3 OSDs you can have OSDs to pair up for data transfers thus (theoretically) 10 OSD cluster can have 5 pairs to transfer data in parallel at 150MB/s achieving total 750MB/s recovery speed.

Regards,
Vladimir

On 28 October 2021 7:11:31 pm AEDT, Lokendra Rathour <lokendrarathour@xxxxxxxxx> wrote:
Hey Johansson,thanks for the update here. two things in line with your response.
for now, I am able to change these values via ceph.conf and restart the osd service, so are there any runtime commands as well to do so ? I am using Ceph Pacific or Octopus version installed using ceph-ansible.
what do you mean by "

allow more parallelism" - are you referring to modifying threads with this config  "osd recovery threads" or please help elaborate.
Thanks once again for your help.

On Thu, Oct 28, 2021 at 1:05 PM Janne Johansson <icepic.dz@xxxxxxxxx> wrote:

Den tors 28 okt. 2021 kl 09:09 skrev Lokendra Rathour <lokendrarathour@xxxxxxxxx>:
Hi,we have been trying to test  a scenario on ceph with the following configuration:
 cluster:
    id:     cc0ba1e4-68b9-4237-bc81-40b38455f713
    health: HEALTH_OK
  services:
    mon: 3 daemons, quorum storagenode1,storagenode2,storagenode3 (age 4h)
    mgr: storagenode2(active, since 22h), standbys: storagenode1, storagenode3
    mds: cephfs:1 {0=storagenode1=up:active} 2 up:standby
    osd: 3 osds: 3 up (since 4m), 3 in (since 4h)
    rgw: 3 daemons active (storagenode1.rgw0, storagenode2.rgw0, storagenode3.rgw0)
  task status:
    scrub status:
        mds.storagenode1: idle
  data:
    pools:   7 pools, 169 pgs
    objects: 1.06M objects, 1.3 TiB
    usage:   3.9 TiB used, 9.2 TiB / 13 TiB avail
    pgs:     169 active+clean
  io:
    client:   43 KiB/s wr, 0 op/s rd, 3 op/s wr
    recovery: 154 MiB/s, 98 objects/s

We have network links of 10GiG for all the networks used in Ceph. MTU is configured as 9000. But the Transfer rate as can be seen above is max 154 MiB/s which I feel is way low than possible. 

Test Case:
We removed one node and added it back to the Ceph Cluster after reinstalling the OS. During this time of activity, Ceph has around 1.3 TB to rebalance in the newly added node. The time taken in such a case is approximate: 4 hours. 

Considering this as the production-grade setup with all production-grade infra, this time is too much.

Query:
Is there a way to optimize the recovery/rebalancing and i/o rate of Ceph?
we found a few suggestions on the internet that we can modify the below parameters to achieve a good rate, but is this advisable
  osd max backfills, osd recovery max active, osd recovery max single start 
we have dedicated 10gig n/w infra so can we have any ideal value to reach max rate of recovery.

Any input would be helpful, we are really blocked here.

If this is one spinning drive receiving data, then those figures look ok. If you instead had a large cluster with more drives, the sum of the recovery traffic would be more if you allow more parallelism. Looking at osd_max_backfills to see how many parallel backfills you will allow and looking at posts and guides like this:
https://www.suse.com/support/kb/doc/?id=000019693
might also help.

-- 
May the most significant bit of your life be positive.

--
skype: lokendrarathour

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

-- 
~ Lokendra

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx