Re: Permanent MDS restarting under load

Gregory Farnum <gfarnum@xxxxxxxxxx> · Tue, 10 Nov 2015 12:38:08 -0800

On Tue, Nov 10, 2015 at 6:32 AM, Oleksandr Natalenko
<oleksandr@xxxxxxxxxxxxxx> wrote:
> Hello.
>
> We have CephFS deployed over Ceph cluster (0.94.5).
>
> We experience constant MDS restarting under high IOPS workload (e.g.
> rsyncing lots of small mailboxes from another storage to CephFS using
> ceph-fuse client). First, cluster health goes to HEALTH_WARN state with the
> following disclaimer:
>
> ===
> mds0: Behind on trimming (321/30)
> ===
>
> Also, slow requests start to appear:
>
> ===
> 2 requests are blocked > 32 sec
> ===

Which requests are they? Are these MDS operations or OSD ones?

>
> Then, after a while, one of MDSes fails with the following log:
>
> ===
> лис 10 16:07:41 baikal bash[10122]: 2015-11-10 16:07:41.915540 7f2484f13700
> -1 MDSIOContextBase: blacklisted!  Restarting...
> лис 10 16:07:41 baikal bash[10122]: starting mds.baikal at :/0
> лис 10 16:07:42 baikal bash[10122]: 2015-11-10 16:07:42.003189 7f82b477e7c0
> -1 mds.-1.0 log_to_monitors {default=true}
> ===

So that "blacklisted" means that the monitors decided the MDS was
nonresponsive, failed over to another daemon, and blocked this one off
from the cluster.

> I guess writing lots of small files bloats MDS log, and MDS doesn't catch
> trimming in time. That's why it is marked as failed and is replaced by
> standby MDS. We tried to limit mds_log_max_events to 30 events, but that
> caused MDS to fail very quickly with the following stacktrace:
>
> ===
> Stacktrace: https://gist.github.com/4c8a89682e81b0049f3e
> ===
>
> Is that normal situation, or one could rate-limit client requests? May be
> there should be additional knobs to tune CephFS for handling such a
> workload?

Yeah, the MDS doesn't really do a good job back-pressuring clients
right now when it or the OSDs aren't keeping up with the workload.
That's something we need to work on once fsck stuff is behaving. rsync
is also (sadly) a workload that frequently exposes these problems, but
I'm not used to seeing the MDS daemon get stuck quite that quickly.
How frequently is it actually getting swapped?
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com