Hi cephers,
Recently we've started using cephfs snapshots more - and seem to be
running into a rather annoying performance issue with the MDS. The
cluster in question is on Nautilus 14.2.20.
Typically, the MDS processes a few thousand requests per second with all
operations showing latencies in the few millisecond range (in mds perf
dump) and the system seems quite responsive (directory listings, general
metadata operations feel quick). Every so often, the MDS transitions
into a period of super high latency: 0.1 to 2 seconds per operation (as
measured by increases in the latency counters in mds perf dump). During
these high latency periods, the request rate is about the same (low
1000s requests/s) - but one thread of the MDS called 'fn_anonymous' is
100% busy. Pointing the debugger to it and getting a stack trace at
random times always shows a similar picture:
#0 0x0000555555787d4c in CInode::get_oldest_parent_dn
(this=0x555843e14700) at
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/CInode.h:780
#1 CInode::is_ancestor_of (this=0x55561ebbee00,
other=other@entry=0x555843e14700) at
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/CInode.cc:917
#2 0x00005555557e4147 in SnapRealm::split_at
(this=this@entry=0x555557c3a500, child=0x5555c1ad7900) at
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/SnapRealm.cc:593
#3 0x00005555557a1a5f in CInode::open_snaprealm (this=0x55561ebbee00,
nosplit=<optimized out>) at
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/CInode.cc:2764
#4 0x00005555557a1e96 in CInode::pop_projected_snaprealm
(this=0x55561ebbee00, next_snaprealm=0x555571b24000, early=<optimized
out>) at /usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/CInode.cc:586
#5 0x00005555557a2462 in CInode::early_pop_projected_snaprealm
(this=this@entry=0x55561ebbee00) at /usr/include/c++/8/bits/stl_list.h:1122
#6 0x00005555555d1dc0 in Server::_unlink_local_finish
(this=0x555557055680, mdr=..., dn=0x5558e50845a0,
straydn=0x55582c1981e0, dnpv=513) at
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/Server.cc:7025
#7 0x00005555558067df in Context::complete (r=0, this=0x555715b043c0)
at /usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/include/Context.h:77
#8 MDSContext::complete (this=0x555715b043c0, r=0) at
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/MDSContext.cc:29
#9 0x0000555555806a8f in MDSIOContextBase::complete
(this=0x555715b043c0, r=0) at
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/MDSContext.cc:114
#10 0x0000555555806d54 in MDSLogContextBase::complete (this=<optimized
out>, r=0) at
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/MDSContext.cc:123
#11 0x00007fffef02062d in Finisher::finisher_thread_entry() () from
target:/usr/lib64/ceph/libceph-common.so.0
#12 0x00007fffecd7d14a in start_thread () from target:/lib64/libpthread.so.0
#13 0x00007fffeb896dc3 in clone () from target:/lib64/libc.so.6
#0 0x0000555555787d37 in CInode::is_ancestor_of (this=0x55580f021800,
other=0x5557c1b1c300, other@entry=0x5557c9146700) at
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/CDentry.h:150
#1 0x00005555557e4147 in SnapRealm::split_at
(this=this@entry=0x555557ea0280, child=0x55584e1c1b80) at
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/SnapRealm.cc:593
#2 0x00005555557a1a5f in CInode::open_snaprealm (this=0x55580f021800,
nosplit=<optimized out>) at
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/CInode.cc:2764
#3 0x00005555557a1e96 in CInode::pop_projected_snaprealm
(this=0x55580f021800, next_snaprealm=0x55556425f140, early=<optimized
out>) at /usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/CInode.cc:586
#4 0x00005555557a2462 in CInode::early_pop_projected_snaprealm
(this=this@entry=0x55580f021800) at /usr/include/c++/8/bits/stl_list.h:1122
#5 0x00005555555d1dc0 in Server::_unlink_local_finish
(this=0x555556438500, mdr=..., dn=0x5558541e8f00,
straydn=0x555946731860, dnpv=77222) at
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/Server.cc:7025
#6 0x00005555558067df in Context::complete (r=0, this=0x555943b55f90)
at /usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/include/Context.h:77
#7 MDSContext::complete (this=0x555943b55f90, r=0) at
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/MDSContext.cc:29
#8 0x0000555555806a8f in MDSIOContextBase::complete
(this=0x555943b55f90, r=0) at
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/MDSContext.cc:114
#9 0x0000555555806d54 in MDSLogContextBase::complete (this=<optimized
out>, r=0) at
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/MDSContext.cc:123
#10 0x00007fffef02062d in Finisher::finisher_thread_entry() () from
target:/usr/lib64/ceph/libceph-common.so.0
#11 0x00007fffecd7d14a in start_thread () from target:/lib64/libpthread.so.0
#12 0x00007fffeb896dc3 in clone () from target:/lib64/libc.so.6
The cluster is otherwise healthy - I don't see any performance issues
with other ceph processes (like OSDs). I scaled snapshots back, so
there is only one snapshot in the whole file system when this problem is
happening. From the performance counters, it looks like the MDS is
doing about 20 unlink and 3 rmdir operations/s on average when this
happens. From the stack trace it seems like somehow the unlink
operations + the snapshots don't play well together?
Here are latencies from these "bad" periods (in seconds):
MDS:
req/s: 3916.6, reply/s: 3917.0, lat: 0.3086
Latencies:
req_create : req/s: 426.4, lat: 0.5452
req_getattr : req/s: 58.3, lat: 0.0269
req_lookup : req/s: 3340.6, lat: 0.2881
req_mkdir : req/s: 0.6, lat: 0.0417
req_open : req/s: 51.1, lat: 0.1990
req_readdir : req/s: 3.1, lat: 0.0338
req_rename : req/s: 2.2, lat: 0.3481
req_rmdir : req/s: 3.3, lat: 0.0545
req_rmxattr : req/s: 0.2, lat: 0.3395
req_setattr : req/s: 0.4, lat: 0.1154
req_setfilelock : req/s: 8.8, lat: 0.0672
req_unlink : req/s: 21.5, lat: 0.0113
and for a typical good period (a random sample):
MDS:
req/s: 3101.7, reply/s: 3101.7, lat: 0.0001
Latencies:
req_create : req/s: 12.2, lat: 0.0008
req_getattr : req/s: 83.4, lat: 0.0000
req_lookup : req/s: 1102.1, lat: 0.0000
req_open : req/s: 8.3, lat: 0.0105
req_readdir : req/s: 1.0, lat: 0.0010
req_rename : req/s: 1.8, lat: 0.0001
req_setfilelock : req/s: 1890.4, lat: 0.0000
req_unlink : req/s: 2.6, lat: 0.0001
Has anyone come across this issue or have any hints/suggestions?
Andras
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx