OSD replacement causes slow requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi list,

we're facing an unexpected recovery behavior of an upgraded cluster (Luminous -> Nautilus).

We added new servers with Nautilus to the existing Luminous cluster, so we could first replace the MONs step by step. Then we moved the old servers to a new root in the crush map and then added the new OSDs to the default root so we would need to rebalance the data only once. This almost worked as planned, except for many slow and stuck requests. We did this after business hours so the impact was negligable and we didn't really investigate, the goal was to finish the rebalancing.

But only after two days one of the new OSDs (osd.30) already reported errors, so we need to replace that disk. The replacement disk (osd.0) has been added with an initial crush weight of 0 (also reweight 0) to control the backfill with small steps. This seems to be harder than it should (also than we experienced so far), no matter how small the steps are, the cluster immediately reports slow requests. We can't disrupt the production environment so we cancelled the backfill/recovery for now. But this procedure has been successful in the past with Luminous, that's why we're so surprised.

The recovery and backfill parameters are pretty low:

    "osd_max_backfills": "1",
    "osd_recovery_max_active": "3",

This usually allowed us a slow backfill to be able to continue productive work, now it doesn't.

Our ceph version is (only the active MDS still runs Luminous, the designated server is currently being upgraded):

14.2.0-300-gacd2f2b9e1 (acd2f2b9e196222b0350b3b59af9981f91706c7f) nautilus (stable)

Is there anything we missed that we should be aware of in Nautilus regarding recovery and replacement scenarios? We couldn't reduce the weight of that osd lower than 0.16, everything else results in slow requests. During the weight reduction several PGs keep stuck in activating+remapped state when, only recoverable (sometimes) by restarting that affected osd several times. Reducing crush weight leads to the same effect.

Please note: the old servers in root-ec are going to be ec-only OSDs, that's why they're still in the cluster.

Any pointers to what goes wrong here would be highly appreciated! If you need any other information I'd be happy to provide it.

Best regards,
Eugen


This is our osd tree:

ID  CLASS WEIGHT   TYPE NAME             STATUS REWEIGHT PRI-AFF
-19       11.09143 root root-ec
 -2        5.54572     host ceph01
  1   hdd  0.92429         osd.1           down        0 1.00000
  4   hdd  0.92429         osd.4             up        0 1.00000
  6   hdd  0.92429         osd.6             up        0 1.00000
 13   hdd  0.92429         osd.13            up        0 1.00000
 16   hdd  0.92429         osd.16            up        0 1.00000
 18   hdd  0.92429         osd.18            up        0 1.00000
 -3        5.54572     host ceph02
  2   hdd  0.92429         osd.2             up        0 1.00000
  5   hdd  0.92429         osd.5             up        0 1.00000
  7   hdd  0.92429         osd.7             up        0 1.00000
 12   hdd  0.92429         osd.12            up        0 1.00000
 17   hdd  0.92429         osd.17            up        0 1.00000
 19   hdd  0.92429         osd.19            up        0 1.00000
 -5              0     host ceph03
 -1       38.32857 root default
-31       10.79997     host ceph04
 25   hdd  3.59999         osd.25            up  1.00000 1.00000
 26   hdd  3.59999         osd.26            up  1.00000 1.00000
 27   hdd  3.59999         osd.27            up  1.00000 1.00000
-34       14.39995     host ceph05
  0   hdd  3.59998         osd.0             up        0 1.00000
 28   hdd  3.59999         osd.28            up  1.00000 1.00000
 29   hdd  3.59999         osd.29            up  1.00000 1.00000
 30   hdd  3.59999         osd.30            up  0.15999       0
-37       10.79997     host ceph06
 31   hdd  3.59999         osd.31            up  1.00000 1.00000
 32   hdd  3.59999         osd.32            up  1.00000 1.00000
 33   hdd  3.59999         osd.33            up  1.00000 1.00000


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux