On 04/07/16 11:22, Kenneth Waegeman wrote:
On 01/07/16 16:01, Yan, Zheng wrote:
On Fri, Jul 1, 2016 at 6:59 PM, John Spray <jspray@xxxxxxxxxx> wrote:
On Fri, Jul 1, 2016 at 11:35 AM, Kenneth Waegeman
<kenneth.waegeman@xxxxxxxx> wrote:
Hi all,
While syncing a lot of files to cephfs, our mds cluster got
haywire: the
mdss have a lot of segments behind on trimming: (58621/30)
Because of this the mds cluster gets degraded. RAM usage is about
50GB. The
mdses were respawning and replaying continiously, and I had to stop
all
syncs , unmount all clients and increase the beacon_grace to keep the
cluster up .
[root@mds03 ~]# ceph status
cluster 92bfcf0a-1d39-43b3-b60f-44f01b630e47
health HEALTH_WARN
mds0: Behind on trimming (58621/30)
monmap e1: 3 mons at
{mds01=10.141.16.1:6789/0,mds02=10.141.16.2:6789/0,mds03=10.141.16.3:6789/0}
election epoch 170, quorum 0,1,2 mds01,mds02,mds03
fsmap e78658: 1/1/1 up {0=mds03=up:active}, 2 up:standby
osdmap e19966: 156 osds: 156 up, 156 in
flags sortbitwise
pgmap v10213164: 4160 pgs, 4 pools, 253 TB data, 203 Mobjects
357 TB used, 516 TB / 874 TB avail
4151 active+clean
5 active+clean+scrubbing
4 active+clean+scrubbing+deep
client io 0 B/s rd, 0 B/s wr, 63 op/s rd, 844 op/s wr
cache io 68 op/s promote
Now it finally is up again, it is trimming very slowly (+-120
segments /
min)
Hmm, so it sounds like something was wrong that got cleared by either
the MDS restart or the client unmount, and now it's trimming at a
healthier rate.
What client (kernel or fuse, and version)?
Can you confirm that the RADOS cluster itself was handling operations
reasonably quickly? Is your metadata pool using the same drives as
your data? Were the OSDs saturated with IO?
While the cluster was accumulating untrimmed segments, did you also
have a "client xyz failing to advanced oldest_tid" warning?
This does not prevent MDS from trimming log segment.
It would be good to clarify whether the MDS was trimming slowly, or
not at all. If you can reproduce this situation, get it to a "behind
on trimming" state, and the stop the client IO (but leave it mounted).
See if the (x/30) number stays the same. Then, does it start to
decrease when you unmount the client? That would indicate a
misbehaving client.
Behind on trimming on single MDS cluster should be caused by either
slow rados operations or MDS trim too few log segments on each tick.
Kenneth, could you try setting mds_log_max_expiring to a large value
(such as 200)
I've set the mds_log_max_expiring to 200 right now. Should I see
something instantly?
The trimming finished rather quick, although I don't have any accurate
time measures. Cluster looks running fine right now, but running
incremental sync. We will try with same data again to see if it is ok now.
Is this mds_log_max_expiring option production ready ? (Don't seem to
find it in documentation)
Thank you!!
K
This weekend , the trimming did not contunue and something happened to
the cluster:
mds.0.cache.dir(1000da74e85) commit error -2 v 2466977
log_channel(cluster) log [ERR] : failed to commit dir 1000da74e85
object, errno -2
mds.0.78429 unhandled write error (2) No such file or directory, force
readonly...
mds.0.cache force file system read-only
log_channel(cluster) log [WRN] : force file system read-only
and ceph health reported:
mds0: MDS in read-only mode
I restarted it and it is trimming again.
Thanks again!
Kenneth
Regards
Yan, Zheng
John
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com