Re: [Ceph-users] Re: MDS failing under load with large cache sizes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

You can also try increasing the aggressiveness of the MDS recall but
I'm surprised it's still a problem with the settings I gave you:

ceph config set mds mds_recall_max_caps 15000
ceph config set mds mds_recall_max_decay_rate 0.75

I finally had the chance to try the more aggressive recall settings, but they did not change anything. As soon as the client starts copying files again, the numbers go up an I get a health message that the client is failing to respond to cache pressure.

After this week of idle time, the dns/inos numbers (what does dns stand for anyway?) settled at around 8000k. That's basically that "idle" number that it goes back to when the client stops copying files. Though, for some weird reason, this number gets (quite) a bit higher every time (last time it was around 960k). Of course, I wouldn't expect it to go back all the way to zero, because that would mean dropping the entire cache for no reason, but it's still quite high and the same after restarting the MDS and all clients, which doesn't make a lot of sense to me. After resuming the copy job, the number went up to 20M in just the time it takes to write this email. There must be a bug somewhere.

Can you share two captures of `ceph daemon mds.X perf dump` about 1
second apart.

I attached the requested perf dumps.


Thanks!

Attachment: perf_dump_1.json
Description: application/json

Attachment: perf_dump_2.json
Description: application/json

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux