Re: MDS behind on trimming

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

We've used double the defaults for around 6 months now and haven't had any behind on trimming errors in that time.

   mds log max segments = 60
   mds log max expiring = 40

Should be simple to try.

-- dan



On Thu, Dec 21, 2017 at 2:32 PM, Stefan Kooman <stefan@xxxxxx> wrote:
Hi,

We have two MDS servers. One active, one active-standby. While doing a
parallel rsync of 10 threads with loads of files, dirs, subdirs we get
the following HEALTH_WARN:

ceph health detail
HEALTH_WARN 2 MDSs behind on trimming
MDS_TRIM 2 MDSs behind on trimming
    mdsmds2(mds.0): Behind on trimming (124/30)max_segments: 30,
    num_segments: 124
    mdsmds1(mds.0): Behind on trimming (118/30)max_segments: 30,
    num_segments: 118

To be clear: the amount of segments behind on trimming fluctuates. It
sometimes does get smaller, and is relatively stable around ~ 130.

The load on the MDS is low, load on OSDs is low (both CPU/RAM/IO). All
flash, cephfs_metadata co-located on the same OSDs. Using cephfs kernel
client (4.13.0-19-generic) with Ceph 12.2.2 (cllient as well as cluster
runs Ceph 12.2.2). In older threads I found several possible
explanations for getting this warning:

1) When the number of segments exceeds that setting, the MDS starts
  writing back metadata so that it can remove (trim) the oldest
  segments. If this process is too slow, or a software bug is preventing
  trimming, then this health message appears.

2) The OSDs cannot keep up with the load

3) cephfs kernel client  mis behaving / bug

I definitely don't think nr 2) is the reason. I doubt it's a Ceph MDS 1)
or client bug 3). Might this be conservative default settings? I.e. not
trying to trim fast / soon enough. John wonders in thread [1] if the
default journal length should be longer. Yan [2] recommends bumping
"mds_log_max_expiring" to a large value (200).

What would you suggest at this point? I'm thinking about the following
changes:

mds log max segments = 200
mds log max expiring = 200

Thanks,

Stefan

[1]: https://www.spinics.net/lists/ceph-users/msg39387.html
[2]:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-July/011138.html

--
| BIT BV  http://www.bit.nl/        Kamer van Koophandel 09090351
| GPG: 0xD14839C6                   +31 318 648 688 / info@xxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux