On Tue, Aug 6, 2019 at 12:48 AM Janek Bevendorff <janek.bevendorff@xxxxxxxxxxxxx> wrote: > > However, now my client processes are basically in constant I/O wait > > state and the CephFS is slow for everybody. After I restarted the copy > > job, I got around 4k reqs/s and then it went down to 100 reqs/s with > > everybody waiting their turn. So yes, it does seem to help, but it > > increases latency by a magnitude. 4k req/s is too fast for a create workload on one MDS. That must include other operations like getattr. > Addition: I reduced the number to 256K and the cache size started > inflating instantly (with about 140 reqs/s). So I reset it to 512K and > the cache size started reducing slowly, though with fewer reqs/s. > > So I guess it is solving the problem, but only by trading it off against > severe latency issues (order of magnitude as we saw). I wouldn't expect such extreme latency issues. Please share: ceph config dump ceph daemon mds.X cache status and the two perf dumps one second apart again please. Also, you said you removed the aggressive recall changes. I assume you didn't reset them to the defaults, right? Just the first suggested change (10k/1.0)? -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com