Hi, we set now: mds_session_blacklist_on_timeout to false mds_session_blacklist_on_evict to false mds_cap_revoke_eviction_timeout to 900 for now there was no loss of mount or kernel crash. However, one of our big computation jobs is finished and so the load on the fs is less as well. We will keep an eye on it. Thanks again Dietmar On 2020-02-13 09:37, Dietmar Rieder wrote: > Hi, > > they were not down as far as I can tell form the affected osd logs at > the time in question. > I'll try to play with those values, thanks. Is there anything else that > might help? > The kernel crash is something that makes me nervous. > > Dietmar > > On 2020-02-13 09:16, thoralf schulze wrote: >> hi Dietmar, >> >> were the osds really down, or was this just the perception of the hung >> client? >> >> playing around with mds_session_blacklist_on_timeout, >> mds_session_blacklist_on_evict to allow the clients to actually >> reconnect and mds_cap_revoke_eviction_timeout to forcibly evict hung >> client might be worth looking into. >> >>> $ uname -a >>> Linux apollo-08.local 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue > Feb 4 >>> 23:02:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux >> >> iirc (and for debian), client reconnects only work properly in kernel >> versions ≥ 3.19. >> >> hth, >> t. >> >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > > -- _________________________________________ D i e t m a r R i e d e r, Mag.Dr. Innsbruck Medical University Biocenter - Institute of Bioinformatics Email: dietmar.rieder@xxxxxxxxxxx Web: http://www.icbi.at _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx