Hi Dan,
thanks for the link, I've been reading it over and over again but
still didn't come to a conclusion yet.
IIRC, the maintenance windows are one hour long, currently every week.
But it's not entirely clear if the maintenance will even have an
impact, because apparently, last time nobody complained. But there
have been interruptions which caused stale clients in the last weeks,
so it's difficult to predict.
They mainly use rbd and CephFS for k8s clusters, but so far I haven't
heard about rbd issues during this maintenance windows.
They have grafana showing a drop of many MDS sessions when the network
is interrupted, I think from around 130 active sessions to around 30.
So not all sessions were dropped. After the maintenance, they failed
the MDS and the number of sessions was restored. Since they don't have
access to the k8s clusters themselves, they can't do much on that
side. We're still wondering if a MDS failover is really necessary or
if anything on the client side could be done. But I only have very
limited details on this. The MDS log (I don't have a copy) shows that
the session drops are caused by the client evictions.
Do you think it could make sense to disable client
eviction/blocklisting only during this maintenance window? Or can that
be dangerous because we can't predict which clients will actually be
interrupted and how k8s will handle the returning clients if they
won't be evicted?
Thanks
Eugen
Zitat von Dan van der Ster <dan.vanderster@xxxxxxxxx>:
Hi Eugene,
Disabling blocklisting on eviction is a pretty standard config. In my
experience it allows clients resume their session cleanly without needing a
remount.
There's docs about this here:
https://docs.ceph.com/en/latest/cephfs/eviction/#advanced-configuring-blocklisting
I don't have a good feeling if this will be useful for your network
intervention though... What are you trying to achieve? How long will
clients be unreachable?
Cheers, Dan
--
Dan van der Ster
CTO@CLYSO & CEC Member
On Thu, Nov 21, 2024, 10:15 Eugen Block <eblock@xxxxxx> wrote:
Hi,
can anyone share some experience with these two configs?
ceph config get mds mds_session_blocklist_on_timeout
true
ceph config get mds mds_session_blocklist_on_evict
true
If there's some network maintenance going on and the client connection
is interrupted, could it help to disable evicting and blocklisting MDS
clients? And what risks should we be aware of if we tried that? We're
not entirely sure yet if this could be a reasonable approach, but
we're trying to figure out how to make network maintenance less
painful for clients.
I'm also looking at some other possible configs, but let's start with
these two first.
Any comments would be appreciated!
Thanks!
Eugen
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx