Cephfs MDS slow requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All

I have a Samba server that is exporting directories from a Cephfs Kernel mount. Performance has been pretty good for the last year but users have recently been complaining of short "freezes", these seem to coincide with MDS related slow requests in the monitor ceph.log such as:

2018-03-13 13:34:58.461030 osd.15 osd.15 10.10.10.211:6812/13367 5752 : cluster [WRN] slow request 31.834418 seconds old, received at 2018-03-13 13:34:26.626474: osd_repop(mds.0.5495:810644 3.3e e14085/14019 3:7cea5bac:::10001a88b8f.00000000:head v 14085'846936) currently commit_sent
2018-03-13 13:34:59.461270 osd.15 osd.15 10.10.10.211:6812/13367 5754 : cluster [WRN] slow request 32.832059 seconds old, received at 2018-03-13 13:34:26.629151: osd_repop(mds.0.5495:810671 2.dc2 e14085/14020 2:43bdcc3f:::10001e91a91.00000000:head v 14085'21394) currently commit_sent
2018-03-13 14:23:57.409427 osd.30 osd.30 10.10.10.212:6824/14997 5708 : cluster [WRN] slow request 30.536832 seconds old, received at 2018-03-13 14:23:26.872513: osd_repop(mds.0.5495:865403 2.fb6 e14085/14077 2:6df955ef:::10001e93542.000000c4:head v 14085'21296) currently commit_sent
2018-03-13 14:23:57.409449 osd.30 osd.30 10.10.10.212:6824/14997 5709 : cluster [WRN] slow request 30.529640 seconds old, received at 2018-03-13 14:23:26.879704: osd_repop(mds.0.5495:865407 2.595 e14085/14019 2:a9a56101:::10001e93542.000000c8:head v 14085'20437) currently commit_sent
2018-03-13 14:23:57.409453 osd.30 osd.30 10.10.10.212:6824/14997 5710 : cluster [WRN] slow request 30.503138 seconds old, received at 2018-03-13 14:23:26.906207: osd_repop(mds.0.5495:865423 2.ea e14085/14055 2:57096bbf:::10001e93542.000000d8:head v 14085'21147) currently commit_sent

--

Looking in the MDS log, with debug set to 4, it's full of "setfilelockrule 1" and "setfilelockrule 2":

2018-03-13 14:23:00.446905 7fde43e73700  4 mds.0.server handle_client_request client_request(client.9174621:141162337 setfilelockrule 1, type 4, owner 14971048052668053939, pid 7, start 120, length 1, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=1155, caller_gid=1131{}) v2
2018-03-13 14:23:00.447050 7fde43e73700  4 mds.0.server handle_client_request client_request(client.9174621:141162338 setfilelockrule 2, type 4, owner 14971048137043556787, pid 4632, start 0, length 0, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=0, caller_gid=0{}) v2
2018-03-13 14:23:00.447258 7fde43e73700  4 mds.0.server handle_client_request client_request(client.9174621:141162339 setfilelockrule 2, type 4, owner 14971048137043550643, pid 4632, start 0, length 0, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=0, caller_gid=0{}) v2
2018-03-13 14:23:00.447393 7fde43e73700  4 mds.0.server handle_client_request client_request(client.9174621:141162340 setfilelockrule 1, type 4, owner 14971048052668053939, pid 7, start 124, length 1, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=1155, caller_gid=1131{}) v2

--

I don't have a particularly good monitoring set up on this cluster yet, but a cursory look at a few things such as iostat doesn't seem to suggest OSDs are being hammered.

Some questions:

1) Can anyone recommend a way of diagnosing this issue?
2) Are the multiple "setfilelockrule" per inode to be expected? I assume this is something to do with the Samba oplocks.
3) What's the recommended highest MDS debug setting before performance starts to be adversely affected (I'm aware log files will get huge)?
4) What's the best way of matching inodes in the MDS log to the file names in cephfs?

Hardware/Versions:

Luminous 12.1.1
Cephfs client 3.10.0-514.2.2.el7.x86_64
Samba 4.4.4
4 node cluster, each node 1xIntel 3700 NVME, 12x SATA, 40Gbps networking

Thanks in advance!

Cheers,
David


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux