strange backfill delay after outing one node

Simon Oosthoek <s.oosthoek@xxxxxxxxxxxxx> · Wed, 14 Aug 2019 09:48:55 +0200

Hi all,

Yesterday I marked out all the osds on one node in our new cluster to
reconfigure them with WAL/DB on their NVMe devices, but it is taking
ages to rebalance. The whole cluster (and thus the osds) is only ~1%
full, therefore the full ratio is nowhere in sight.

We have 14 osd nodes with 12 disks each, one of them was marked out,
Yesterday around noon. It is still not completed and all the while, the
cluster is in ERROR state, even though this is a normal maintenance
operation.

We are still experimenting with the cluster, and it is still operational
while being in ERROR state, however it is slightly worrying when
considering that it could take even (50x?) longer if the cluster has 50x
the amount of data. And the OSD's are mostly flatlined in the dashboard
graphs, so it could potentially do it much faster, I think.

below are a few outputs of ceph -s(w):

Yesterday afternoon (~16:00)
# ceph -w
  cluster:
    id:     b489547c-ba50-4745-a914-23eb78e0e5dc
    health: HEALTH_ERR
            Degraded data redundancy (low space): 139 pgs backfill_toofull

  services:
    mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 4h)
    mgr: cephmon1(active, since 4h), standbys: cephmon2, cephmon3
    mds: cephfs:1 {0=cephmds1=up:active} 1 up:standby
    osd: 168 osds: 168 up (since 3h), 156 in (since 3h); 1588 remapped pgs
    rgw: 1 daemon active (cephs3.rgw0)

  data:
    pools:   12 pools, 4116 pgs
    objects: 14.04M objects, 11 TiB
    usage:   20 TiB used, 1.7 PiB / 1.8 PiB avail
    pgs:     16188696/109408503 objects misplaced (14.797%)
             2528 active+clean
             1422 active+remapped+backfill_wait
             139  active+remapped+backfill_wait+backfill_toofull
             27   active+remapped+backfilling

  io:
    recovery: 205 MiB/s, 198 objects/s

  progress:
    Rebalancing after osd.47 marked out
      [=====================.........]
    Rebalancing after osd.5 marked out
      [===================...........]
    Rebalancing after osd.132 marked out
      [=====================.........]
    Rebalancing after osd.90 marked out
      [=====================.........]
    Rebalancing after osd.76 marked out
      [=====================.........]
    Rebalancing after osd.157 marked out
      [==================............]
    Rebalancing after osd.19 marked out
      [=====================.........]
    Rebalancing after osd.118 marked out
      [====================..........]
    Rebalancing after osd.146 marked out
      [=================.............]
    Rebalancing after osd.104 marked out
      [====================..........]
    Rebalancing after osd.62 marked out
      [=======================.......]
    Rebalancing after osd.33 marked out
      [======================........]

This morning:
# ceph -s
  cluster:
    id:     b489547c-ba50-4745-a914-23eb78e0e5dc
    health: HEALTH_ERR
            Degraded data redundancy (low space): 8 pgs backfill_toofull

  services:
    mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 22h)
    mgr: cephmon1(active, since 22h), standbys: cephmon2, cephmon3
    mds: cephfs:1 {0=cephmds2=up:active} 1 up:standby
    osd: 168 osds: 168 up (since 22h), 156 in (since 21h); 189 remapped pgs
    rgw: 1 daemon active (cephs3.rgw0)

  data:
    pools:   12 pools, 4116 pgs
    objects: 14.11M objects, 11 TiB
    usage:   21 TiB used, 1.7 PiB / 1.8 PiB avail
    pgs:     4643284/110159565 objects misplaced (4.215%)
             3927 active+clean
             162  active+remapped+backfill_wait
             19   active+remapped+backfilling
             8    active+remapped+backfill_wait+backfill_toofull

  io:
    client:   32 KiB/s rd, 0 B/s wr, 31 op/s rd, 21 op/s wr
    recovery: 198 MiB/s, 149 objects/s

  progress:
    Rebalancing after osd.47 marked out
      [=============================.]
    Rebalancing after osd.5 marked out
      [=============================.]
    Rebalancing after osd.132 marked out
      [=============================.]
    Rebalancing after osd.90 marked out
      [=============================.]
    Rebalancing after osd.76 marked out
      [=============================.]
    Rebalancing after osd.157 marked out
      [=============================.]
    Rebalancing after osd.19 marked out
      [=============================.]
    Rebalancing after osd.146 marked out
      [=============================.]
    Rebalancing after osd.104 marked out
      [=============================.]
    Rebalancing after osd.62 marked out
      [=============================.]

I found some hints, though I'm not sure it's right for us at this url:
https://forum.proxmox.com/threads/increase-ceph-recovery-speed.36728/
:
> ceph tell 'osd.*' injectargs '--osd-max-backfills 16'
> ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'

Since the cluster is currently hardly loaded, backfilling can take up
all the unused bandwidth as far as I'm concerned...

Is it a good idea to give the above commands or other commands to speed
up the backfilling? (e.g. like increasing "osd max backfills")

Cheers

/Simon
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com