Re: Permanent MDS restarting under load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




10.11.2015 22:38, Gregory Farnum wrote:

Which requests are they? Are these MDS operations or OSD ones?

Those requests appeared in ceph -w output and are the follows:

https://gist.github.com/5045336f6fb7d532138f

Is that correct that there are OSD operations blocked? osd.3 is one of data pool HDDs, and other OSDs also appear in slow requests warning besides osd.3 as well.

I guess that may be related to replica 4 setup of our cluster and only 5 OSDs for each host. But we plan to add 6 more OSDs to each host after data migration is finished. Could that help in spreading load?

So that "blacklisted" means that the monitors decided the MDS was
nonresponsive, failed over to another daemon, and blocked this one off
from the cluster.

So, one could adjust blacklist timeout, but there is no way to rate-limit requests? Am I correct?

Yeah, the MDS doesn't really do a good job back-pressuring clients
right now when it or the OSDs aren't keeping up with the workload.
That's something we need to work on once fsck stuff is behaving. rsync
is also (sadly) a workload that frequently exposes these problems, but
I'm not used to seeing the MDS daemon get stuck quite that quickly.
How frequently is it actually getting swapped?

Quite often. MDSes are swapped once per 1 minute or so under heavy load:

===
лис 10 10:40:47 data.la.net.ua bash[18112]: 2015-11-10 10:40:47.357633 7f76c42e2700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:41:49 data.la.net.ua bash[18112]: 2015-11-10 10:41:49.237962 7f1a939af700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:43:14 data.la.net.ua bash[18112]: 2015-11-10 10:43:14.899375 7f17f6eaa700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:44:11 data.la.net.ua bash[18112]: 2015-11-10 10:44:11.810116 7f693b64c700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:45:14 data.la.net.ua bash[18112]: 2015-11-10 10:45:14.761684 7f7616097700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:46:35 data.la.net.ua bash[18112]: 2015-11-10 10:46:35.927190 7fdfb7f62700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:47:41 data.la.net.ua bash[18112]: 2015-11-10 10:47:41.888064 7fb88139b700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:49:57 data.la.net.ua bash[18112]: 2015-11-10 10:49:57.542545 7fbb360eb700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:51:02 data.la.net.ua bash[18112]: 2015-11-10 10:51:02.486907 7fb488fa1700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:52:03 data.la.net.ua bash[18112]: 2015-11-10 10:52:03.871463 7f4cc0236700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:53:20 data.la.net.ua bash[18112]: 2015-11-10 10:53:20.290494 7f9dc48d3700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:54:17 data.la.net.ua bash[18112]: 2015-11-10 10:54:17.086940 7f45a9105700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:55:17 data.la.net.ua bash[18112]: 2015-11-10 10:55:17.547123 7f6c48f50700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:56:32 data.la.net.ua bash[18112]: 2015-11-10 10:56:32.558378 7f2bf0a70700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:57:34 data.la.net.ua bash[18112]: 2015-11-10 10:57:34.534306 7fc69b42c700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:58:37 data.la.net.ua bash[18112]: 2015-11-10 10:58:37.061903 7fea3de23700 -1 MDSIOContextBase: blacklisted! Restarting... лис 10 10:59:52 data.la.net.ua bash[18112]: 2015-11-10 10:59:52.579594 7fe23b468700 -1 MDSIOContextBase: blacklisted! Restarting...
===

Any idea?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux