On Tue, Aug 6, 2019 at 7:57 AM Janek Bevendorff <janek.bevendorff@xxxxxxxxxxxxx> wrote: > > > > 4k req/s is too fast for a create workload on one MDS. That must > > include other operations like getattr. > > That is rsync going through millions of files checking which ones need > updating. Right now there are not actually any create operations, since > I restarted the copy job. Your parallel rsync job is only getting 150 creates per second? What was the previous throughput? > > I wouldn't expect such extreme latency issues. Please share: > > > > ceph config dump > > ceph daemon mds.X cache status > > Config dump: https://pastebin.com/1jTrjzA9 > > Cache status: > > { > "pool": { > "items": 127688932, > "bytes": 20401092561 > } > } > > > > and the two perf dumps one second apart again please. > Perf dump 1: https://pastebin.com/US3y6JEJ > Perf dump 2: https://pastebin.com/Mm02puje The cache size looks correct here. > > Also, you said you removed the aggressive recall changes. I assume you > > didn't reset them to the defaults, right? Just the first suggested > > change (10k/1.0)? > > Either seems to work. > > I added two more MDSs to split the workload and got a steady 150 reqs/s > after that. Then I noticed that I still had a max segments settings from > one of my earlier attempts at fixing the cache runaway issue and after > removing that, I got 250-500 reqs/s, sometimes up to 1.5k (per MDS). Okay, so you're getting a more normal throughput for parallel creates on a single MDS. > However, to generate the dumps for you, I changed my max_mds setting > back to 1 and reqs/s went down to 80. After re-adding the two active > MDSs again, I am back at higher numbers, although not quite as much as > before. But I think to remember that it took several minutes if not more > until all MDSs received approximately equal load the last time. Try pinning if possible in each parallel rsync job. Here are tracker tickets to resolve the issues you encountered: https://tracker.ceph.com/issues/41140 https://tracker.ceph.com/issues/41141 -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com