Re: mds: failed to decode message of type 43 v7: buffer::end_of_buffer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



You've probably run in to http://tracker.ceph.com/issues/16010 — do you have very large directories? (Or perhaps just a whole bunch of unlinked files which the MDS hasn't managed to trim yet?)

On Tue, Sep 19, 2017 at 11:51 AM Christian Salzmann-Jäckel <Christian.Salzmann@xxxxxxxxxxxx> wrote:
Hi,

we run cephfs  (10.2.9 on Debian jessie; 108 OSDs on 9 nodes) as scratch filesystem for a HPC cluster using IPoIB interconnect with kernel client (Debian backports kernel version 4.9.30).

Our clients started blocking on file system access.
Logs show 'mds0: Behind on trimming' and slow requests to one osd (osd.049).
Replacing the disk of osd.049 didn't show any effect. Clust health is ok.

'ceph daemon mds.cephmon1 dump_ops_in_flight' shows ops from client sessions which are no longer present according to 'ceph daemon mds.cephmon1 session ls'.

We observe traffic of ~200 Mbps on the mds node and this OSD (osd.049).
Stopping the mds process ends the traffic (of course).
Stopping osd.049 shifts traffic to the next OSD (osd.095).
ceph logs show 'slow requests' even after stopping almost all clients.

Debug log on osd.049 show zillions of lines of a single pg (4.22e) of the cephfs_metadata pool which resides on OSDs [49, 95, 9].

2017-09-19 12:20:08.535383 7fd6b98c3700 20 osd.49 pg_epoch: 240725 pg[4.22e( v 240141'1432046 (239363'1429042,240141'1432046] local-les=240073 n=4848 ec=451 les/c/f 240073/240073/0 239916/240072/240072) [49,95,9] r=0 lpr=240072 crt=240129'1432044 lcod 240130'1432045 mlcod 240130'1432045 active+clean] Found key .chunk_4761369_head

Is there anything we can do to get the mds back into operation?

ciao
Christian
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux