Re: MDS Upgrade from 17.2.5 to 17.2.6 not possible

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Henning,

On Wed, May 17, 2023 at 9:25 PM Henning Achterrath <achhen@xxxxxxxxxxx> wrote:
>
> Hi all,
>
> we did a major update from Pacific to Quincy (17.2.5) a month ago
> without any problems.
>
> Now we have tried a minor update from 17.2.5 to 17.2.6 (ceph orch
> upgrade). It stucks at mds upgrade phase. At this point the cluster
> tries to scale down mds (ceph fs set max_mds 1). We waited a few hours.
>
> We are running two active mds with 1 standby. No subdir pinning
> configured. CephFS data pool: 575 TB
>
> While Upgrading, Rank 1 MDS remains in state stopping. During this state
> clients are not able to reconnect. So we paused this upgrade and set
> max_mds to 2 back again and fail rank 1. After that, standby becomes active.
>
> In the mds (rank 1 in stopping state) logs we can see: waiting for
> strays to migrate

mds.1 on shutdown will export its strays to mds.0 - this is expected.

>
> In our second try, we have evicted all clients first without success.
>
> We make daily snapshots on / and rotate them via snapshot scheduler
> after one week.
>
> Is there a way to get rid of stray entries without scale down mds or do
> we have to wait longer?

Do you see the perf counters related to strays (esp. strays_migrated)
increasing for mds.1? If those are not changing, then the stray export
has probably hung - which could be due to a bug. If you see this,
could you send back the logs for both ranks? (assuming those are debug
logs).

>
> We had about the same amount of strays before we did the major upgrade.
> So, it is a bit curious.
>
> Current output from ceph perf dump
>
> Rank0:
>
> "num_strays": 417304,
>          "num_strays_delayed": 3,
>          "num_strays_enqueuing": 0,
>          "strays_created": 567879,
>          "strays_enqueued": 561803,
>          "strays_reintegrated": 13751,
>          "strays_migrated": 4,
>
>
> Rank1:
>
> ceph daemon mds.fdi-cephfs.ceph-service-13.rwdkqs perf dump | grep stray
>
>          "num_strays": 172528,
>          "num_strays_delayed": 0,
>          "num_strays_enqueuing": 0,
>          "strays_created": 418365,
>          "strays_enqueued": 396142,
>          "strays_reintegrated": 67406,
>          "strays_migrated": 4,
>
>
>
> Any help would be appreciated.
>
>
> best regards
> Henning
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx



-- 
Cheers,
Venky
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux