On Wed, Oct 2, 2019 at 9:48 AM Stefan Kooman <stefan@xxxxxx> wrote: > According to [1] there are new parameters in place to have the MDS > behave more stable. Quoting that blog post "One of the more recent > issues weve discovered is that an MDS with a very large cache (64+GB) > will hang during certain recovery events." > > For all of us that are not (yet) running Nautilus I wonder what the best > course of action is to prevent instable MDS during recovery situations. > > Artificially limit the "mds_cache_memory_limit" to say 32 GB? Reduce the MDS cache size. Mimic backport will probably make next minor release: https://github.com/ceph/ceph/pull/28452 > I wonder if the amount of clients is of influence in a MDS being > overwhelmed by release messages. Of are a handfull of clients (with > millions of CAPS) able to overload an MDS? Just one client with millions of caps could cause issues. > Is there a way, other than unmounting cephfs on clients, to decrease the > amount of CAPS the MDS has handed out, before an upgrade to a newer Ceph > release is undertaken when running luminous / Mimic? Incrementally reduce the cache size using a script. > I'm assuming you need to restart the MDS to make the > "mds_cache_memory_limit" effective, is that correct? No. It is respected at runtime. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com