Re: [Ceph-users] Re: MDS failing under load with large cache sizes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've been copying happily for days now (not very fast, but the MDS were stable), but eventually the MDSs started flapping again due to large cache sizes (they are being killed after 11M inodes). I could solve the problem by temporarily increasing the cache size in order to allow them to rejoin, but it tells me that my settings do not fully solve the problem yet (unless perhaps I increase the trim threshold even further.


On 06.08.19 19:52, Janek Bevendorff wrote:
Your parallel rsync job is only getting 150 creates per second? What
was the previous throughput?
I am actually not quite sure what the exact throughput was or is or what
I can expect. It varies so much. I am copying from a 23GB file list that
is split into 3000 chunks which are then processed by 16-24 parallel
rsync processes. I have copied 27 of 64TB so far (according to df -h)
and to my taste it's taking a lot longer than it should be doing. The
main problem here is not that I'm trying to copy 64TB (drop in the
bucket), the problem is that it's 64TB in tiny, small, and medium-sized
files.

This whole MDS mess and several pauses and restarts in between have
completely distorted my sense of how far in the process I actually am or
how fast I would expect it to go. Right now it's starting again from the
beginning, so I expect it'll be another day or so until it starts moving
some real data again.

The cache size looks correct here.
Yeah. Cache appears to be constant-size now. I am still getting
occasional "client failing to respond to cache pressure", but that goes
away as fast as it came.


Try pinning if possible in each parallel rsync job.
I was considering that, but couldn't come up with a feasible pinning
strategy. We have all those files of very different sizes spread very
unevenly across a handful of top-level directories. I get the impression
that I couldn't do much (or any) better than the automatic balancer.


Here are tracker tickets to resolve the issues you encountered:

https://tracker.ceph.com/issues/41140
https://tracker.ceph.com/issues/41141
Thanks a lot!
_______________________________________________
Ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux