Hi all,we did a major update from Pacific to Quincy (17.2.5) a month ago without any problems.
Now we have tried a minor update from 17.2.5 to 17.2.6 (ceph orch upgrade). It stucks at mds upgrade phase. At this point the cluster tries to scale down mds (ceph fs set max_mds 1). We waited a few hours.
We are running two active mds with 1 standby. No subdir pinning configured. CephFS data pool: 575 TB
While Upgrading, Rank 1 MDS remains in state stopping. During this state clients are not able to reconnect. So we paused this upgrade and set max_mds to 2 back again and fail rank 1. After that, standby becomes active.
In the mds (rank 1 in stopping state) logs we can see: waiting for strays to migrate
In our second try, we have evicted all clients first without success.We make daily snapshots on / and rotate them via snapshot scheduler after one week.
Is there a way to get rid of stray entries without scale down mds or do we have to wait longer?
We had about the same amount of strays before we did the major upgrade. So, it is a bit curious.
Current output from ceph perf dump Rank0: "num_strays": 417304, "num_strays_delayed": 3, "num_strays_enqueuing": 0, "strays_created": 567879, "strays_enqueued": 561803, "strays_reintegrated": 13751, "strays_migrated": 4, Rank1: ceph daemon mds.fdi-cephfs.ceph-service-13.rwdkqs perf dump | grep stray "num_strays": 172528, "num_strays_delayed": 0, "num_strays_enqueuing": 0, "strays_created": 418365, "strays_enqueued": 396142, "strays_reintegrated": 67406, "strays_migrated": 4, Any help would be appreciated. best regards Henning
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx