Hi Wido,
thanks for your response.
Have you tried to dump the historic slow ops on the OSDs involved to see
what is going on?
$ ceph daemon osd.X dump_historic_slow_ops
Good question, I don't recall doing that. Maybe my colleague did but
he's on vacation right now. ;-)
But to be clear, are all the OSDs on Nautilus or is there a mix of L and
N OSDs?
I'll try to clarify: it was (and still is) a mixture of L and N OSDs,
but all L-OSDs were empty at the time. The cluster already had
rebalanced all PGs to the new OSDs. So the L-OSDs were not involved in
this recovery process. We're currently upgrading the remaining servers
to Nautilus, there's one left with L-OSDs, but those OSDs don't store
any objects at the moment (different root in crushmap).
The recovery eventually finished successfully, but my colleague had to
do it after business hours, maybe that's why he needs his vacation. ;-)
Regards,
Eugen
Zitat von Wido den Hollander <wido@xxxxxxxx>:
On 7/18/19 12:21 PM, Eugen Block wrote:
Hi list,
we're facing an unexpected recovery behavior of an upgraded cluster
(Luminous -> Nautilus).
We added new servers with Nautilus to the existing Luminous cluster, so
we could first replace the MONs step by step. Then we moved the old
servers to a new root in the crush map and then added the new OSDs to
the default root so we would need to rebalance the data only once. This
almost worked as planned, except for many slow and stuck requests. We
did this after business hours so the impact was negligable and we didn't
really investigate, the goal was to finish the rebalancing.
But only after two days one of the new OSDs (osd.30) already reported
errors, so we need to replace that disk.
The replacement disk (osd.0) has been added with an initial crush weight
of 0 (also reweight 0) to control the backfill with small steps.
This seems to be harder than it should (also than we experienced so
far), no matter how small the steps are, the cluster immediately reports
slow requests. We can't disrupt the production environment so we
cancelled the backfill/recovery for now. But this procedure has been
successful in the past with Luminous, that's why we're so surprised.
The recovery and backfill parameters are pretty low:
"osd_max_backfills": "1",
"osd_recovery_max_active": "3",
This usually allowed us a slow backfill to be able to continue
productive work, now it doesn't.
Our ceph version is (only the active MDS still runs Luminous, the
designated server is currently being upgraded):
14.2.0-300-gacd2f2b9e1 (acd2f2b9e196222b0350b3b59af9981f91706c7f)
nautilus (stable)
Is there anything we missed that we should be aware of in Nautilus
regarding recovery and replacement scenarios?
We couldn't reduce the weight of that osd lower than 0.16, everything
else results in slow requests.
During the weight reduction several PGs keep stuck in
activating+remapped state when, only recoverable (sometimes) by
restarting that affected osd several times. Reducing crush weight leads
to the same effect.
Please note: the old servers in root-ec are going to be ec-only OSDs,
that's why they're still in the cluster.
Any pointers to what goes wrong here would be highly appreciated! If you
need any other information I'd be happy to provide it.
Have you tried to dump the historic slow ops on the OSDs involved to see
what is going on?
$ ceph daemon osd.X dump_historic_slow_ops
But to be clear, are all the OSDs on Nautilus or is there a mix of L and
N OSDs?
Wido
Best regards,
Eugen
This is our osd tree:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-19 11.09143 root root-ec
-2 5.54572 host ceph01
1 hdd 0.92429 osd.1 down 0 1.00000
4 hdd 0.92429 osd.4 up 0 1.00000
6 hdd 0.92429 osd.6 up 0 1.00000
13 hdd 0.92429 osd.13 up 0 1.00000
16 hdd 0.92429 osd.16 up 0 1.00000
18 hdd 0.92429 osd.18 up 0 1.00000
-3 5.54572 host ceph02
2 hdd 0.92429 osd.2 up 0 1.00000
5 hdd 0.92429 osd.5 up 0 1.00000
7 hdd 0.92429 osd.7 up 0 1.00000
12 hdd 0.92429 osd.12 up 0 1.00000
17 hdd 0.92429 osd.17 up 0 1.00000
19 hdd 0.92429 osd.19 up 0 1.00000
-5 0 host ceph03
-1 38.32857 root default
-31 10.79997 host ceph04
25 hdd 3.59999 osd.25 up 1.00000 1.00000
26 hdd 3.59999 osd.26 up 1.00000 1.00000
27 hdd 3.59999 osd.27 up 1.00000 1.00000
-34 14.39995 host ceph05
0 hdd 3.59998 osd.0 up 0 1.00000
28 hdd 3.59999 osd.28 up 1.00000 1.00000
29 hdd 3.59999 osd.29 up 1.00000 1.00000
30 hdd 3.59999 osd.30 up 0.15999 0
-37 10.79997 host ceph06
31 hdd 3.59999 osd.31 up 1.00000 1.00000
32 hdd 3.59999 osd.32 up 1.00000 1.00000
33 hdd 3.59999 osd.33 up 1.00000 1.00000
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com