Re: OSD replacement causes slow requests

Wido den Hollander <wido@xxxxxxxx> · Wed, 24 Jul 2019 09:32:49 +0200

On 7/18/19 12:21 PM, Eugen Block wrote:
> Hi list,
> 
> we're facing an unexpected recovery behavior of an upgraded cluster
> (Luminous -> Nautilus).
> 
> We added new servers with Nautilus to the existing Luminous cluster, so
> we could first replace the MONs step by step. Then we moved the old
> servers to a new root in the crush map and then added the new OSDs to
> the default root so we would need to rebalance the data only once. This
> almost worked as planned, except for many slow and stuck requests. We
> did this after business hours so the impact was negligable and we didn't
> really investigate, the goal was to finish the rebalancing.
> 
> But only after two days one of the new OSDs (osd.30) already reported
> errors, so we need to replace that disk.
> The replacement disk (osd.0) has been added with an initial crush weight
> of 0 (also reweight 0) to control the backfill with small steps.
> This seems to be harder than it should (also than we experienced so
> far), no matter how small the steps are, the cluster immediately reports
> slow requests. We can't disrupt the production environment so we
> cancelled the backfill/recovery for now. But this procedure has been
> successful in the past with Luminous, that's why we're so surprised.
> 
> The recovery and backfill parameters are pretty low:
> 
>     "osd_max_backfills": "1",
>     "osd_recovery_max_active": "3",
> 
> This usually allowed us a slow backfill to be able to continue
> productive work, now it doesn't.
> 
> Our ceph version is (only the active MDS still runs Luminous, the
> designated server is currently being upgraded):
> 
> 14.2.0-300-gacd2f2b9e1 (acd2f2b9e196222b0350b3b59af9981f91706c7f)
> nautilus (stable)
> 
> Is there anything we missed that we should be aware of in Nautilus
> regarding recovery and replacement scenarios?
> We couldn't reduce the weight of that osd lower than 0.16, everything
> else results in slow requests.
> During the weight reduction several PGs keep stuck in
> activating+remapped state when, only recoverable (sometimes) by
> restarting that affected osd several times. Reducing crush weight leads
> to the same effect.
> 
> Please note: the old servers in root-ec are going to be ec-only OSDs,
> that's why they're still in the cluster.
> 
> Any pointers to what goes wrong here would be highly appreciated! If you
> need any other information I'd be happy to provide it.
> 

Have you tried to dump the historic slow ops on the OSDs involved to see
what is going on?

$ ceph daemon osd.X dump_historic_slow_ops

But to be clear, are all the OSDs on Nautilus or is there a mix of L and
N OSDs?

Wido

> Best regards,
> Eugen
> 
> 
> This is our osd tree:
> 
> ID  CLASS WEIGHT   TYPE NAME             STATUS REWEIGHT PRI-AFF
> -19       11.09143 root root-ec
>  -2        5.54572     host ceph01
>   1   hdd  0.92429         osd.1           down        0 1.00000
>   4   hdd  0.92429         osd.4             up        0 1.00000
>   6   hdd  0.92429         osd.6             up        0 1.00000
>  13   hdd  0.92429         osd.13            up        0 1.00000
>  16   hdd  0.92429         osd.16            up        0 1.00000
>  18   hdd  0.92429         osd.18            up        0 1.00000
>  -3        5.54572     host ceph02
>   2   hdd  0.92429         osd.2             up        0 1.00000
>   5   hdd  0.92429         osd.5             up        0 1.00000
>   7   hdd  0.92429         osd.7             up        0 1.00000
>  12   hdd  0.92429         osd.12            up        0 1.00000
>  17   hdd  0.92429         osd.17            up        0 1.00000
>  19   hdd  0.92429         osd.19            up        0 1.00000
>  -5              0     host ceph03
>  -1       38.32857 root default
> -31       10.79997     host ceph04
>  25   hdd  3.59999         osd.25            up  1.00000 1.00000
>  26   hdd  3.59999         osd.26            up  1.00000 1.00000
>  27   hdd  3.59999         osd.27            up  1.00000 1.00000
> -34       14.39995     host ceph05
>   0   hdd  3.59998         osd.0             up        0 1.00000
>  28   hdd  3.59999         osd.28            up  1.00000 1.00000
>  29   hdd  3.59999         osd.29            up  1.00000 1.00000
>  30   hdd  3.59999         osd.30            up  0.15999       0
> -37       10.79997     host ceph06
>  31   hdd  3.59999         osd.31            up  1.00000 1.00000
>  32   hdd  3.59999         osd.32            up  1.00000 1.00000
>  33   hdd  3.59999         osd.33            up  1.00000 1.00000
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com