High cephfs MDS latency and CPU load

Andras Pataki <apataki@xxxxxxxxxxxxxxxxxxxxx> · Wed, 3 Nov 2021 10:17:31 -0400

Hi cephers,

Recently we've started using cephfs snapshots more - and seem to be 
running into a rather annoying performance issue with the MDS.  The 
cluster in question is on Nautilus 14.2.20.

Typically, the MDS processes a few thousand requests per second with all 
operations showing latencies in the few millisecond range (in mds perf 
dump) and the system seems quite responsive (directory listings, general 
metadata operations feel quick).  Every so often, the MDS transitions 
into a period of super high latency: 0.1 to 2 seconds per operation (as 
measured by increases in the latency counters in mds perf dump).  During 
these high latency periods, the request rate is about the same (low 
1000s requests/s) - but one thread of the MDS called 'fn_anonymous' is 
100% busy.  Pointing the debugger to it and getting a stack trace at 
random times always shows a similar picture:

#0  0x0000555555787d4c in CInode::get_oldest_parent_dn 
(this=0x555843e14700) at 
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/CInode.h:780
#1  CInode::is_ancestor_of (this=0x55561ebbee00, 
other=other@entry=0x555843e14700) at 
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/CInode.cc:917
#2  0x00005555557e4147 in SnapRealm::split_at 
(this=this@entry=0x555557c3a500, child=0x5555c1ad7900) at 
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/SnapRealm.cc:593
#3  0x00005555557a1a5f in CInode::open_snaprealm (this=0x55561ebbee00, 
nosplit=<optimized out>) at 
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/CInode.cc:2764
#4  0x00005555557a1e96 in CInode::pop_projected_snaprealm 
(this=0x55561ebbee00, next_snaprealm=0x555571b24000, early=<optimized 
out>) at /usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/CInode.cc:586
#5  0x00005555557a2462 in CInode::early_pop_projected_snaprealm 
(this=this@entry=0x55561ebbee00) at /usr/include/c++/8/bits/stl_list.h:1122
#6  0x00005555555d1dc0 in Server::_unlink_local_finish 
(this=0x555557055680, mdr=..., dn=0x5558e50845a0, 
straydn=0x55582c1981e0, dnpv=513) at 
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/Server.cc:7025
#7  0x00005555558067df in Context::complete (r=0, this=0x555715b043c0) 
at /usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/include/Context.h:77
#8  MDSContext::complete (this=0x555715b043c0, r=0) at 
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/MDSContext.cc:29
#9  0x0000555555806a8f in MDSIOContextBase::complete 
(this=0x555715b043c0, r=0) at 
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/MDSContext.cc:114
#10 0x0000555555806d54 in MDSLogContextBase::complete (this=<optimized 
out>, r=0) at 
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/MDSContext.cc:123
#11 0x00007fffef02062d in Finisher::finisher_thread_entry() () from 
target:/usr/lib64/ceph/libceph-common.so.0
#12 0x00007fffecd7d14a in start_thread () from target:/lib64/libpthread.so.0
#13 0x00007fffeb896dc3 in clone () from target:/lib64/libc.so.6

#0  0x0000555555787d37 in CInode::is_ancestor_of (this=0x55580f021800, 
other=0x5557c1b1c300, other@entry=0x5557c9146700) at 
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/CDentry.h:150
#1  0x00005555557e4147 in SnapRealm::split_at 
(this=this@entry=0x555557ea0280, child=0x55584e1c1b80) at 
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/SnapRealm.cc:593
#2  0x00005555557a1a5f in CInode::open_snaprealm (this=0x55580f021800, 
nosplit=<optimized out>) at 
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/CInode.cc:2764
#3  0x00005555557a1e96 in CInode::pop_projected_snaprealm 
(this=0x55580f021800, next_snaprealm=0x55556425f140, early=<optimized 
out>) at /usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/CInode.cc:586
#4  0x00005555557a2462 in CInode::early_pop_projected_snaprealm 
(this=this@entry=0x55580f021800) at /usr/include/c++/8/bits/stl_list.h:1122
#5  0x00005555555d1dc0 in Server::_unlink_local_finish 
(this=0x555556438500, mdr=..., dn=0x5558541e8f00, 
straydn=0x555946731860, dnpv=77222) at 
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/Server.cc:7025
#6  0x00005555558067df in Context::complete (r=0, this=0x555943b55f90) 
at /usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/include/Context.h:77
#7  MDSContext::complete (this=0x555943b55f90, r=0) at 
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/MDSContext.cc:29
#8  0x0000555555806a8f in MDSIOContextBase::complete 
(this=0x555943b55f90, r=0) at 
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/MDSContext.cc:114
#9  0x0000555555806d54 in MDSLogContextBase::complete (this=<optimized 
out>, r=0) at 
/usr/src/debug/ceph-14.2.20-0.el8.x86_64/src/mds/MDSContext.cc:123
#10 0x00007fffef02062d in Finisher::finisher_thread_entry() () from 
target:/usr/lib64/ceph/libceph-common.so.0
#11 0x00007fffecd7d14a in start_thread () from target:/lib64/libpthread.so.0
#12 0x00007fffeb896dc3 in clone () from target:/lib64/libc.so.6

The cluster is otherwise healthy - I don't see any performance issues 
with other ceph processes (like OSDs).  I scaled snapshots back, so 
there is only one snapshot in the whole file system when this problem is 
happening.  From the performance counters, it looks like the MDS is 
doing about 20 unlink and 3 rmdir operations/s on average when this 
happens.  From the stack trace it seems like somehow the unlink 
operations + the snapshots don't play well together?

Here are latencies from these "bad" periods (in seconds):

MDS:
    req/s:  3916.6,  reply/s:  3917.0,  lat:  0.3086
Latencies:
    req_create      :  req/s:  426.4,  lat:  0.5452
    req_getattr     :  req/s:   58.3,  lat:  0.0269
    req_lookup      :  req/s: 3340.6,  lat:  0.2881
    req_mkdir       :  req/s:    0.6,  lat:  0.0417
    req_open        :  req/s:   51.1,  lat:  0.1990
    req_readdir     :  req/s:    3.1,  lat:  0.0338
    req_rename      :  req/s:    2.2,  lat:  0.3481
    req_rmdir       :  req/s:    3.3,  lat:  0.0545
    req_rmxattr     :  req/s:    0.2,  lat:  0.3395
    req_setattr     :  req/s:    0.4,  lat:  0.1154
    req_setfilelock :  req/s:    8.8,  lat:  0.0672
    req_unlink      :  req/s:   21.5,  lat:  0.0113

and for a typical good period (a random sample):

MDS:
    req/s:  3101.7,  reply/s:  3101.7,  lat:  0.0001
Latencies:
    req_create      :  req/s:   12.2,  lat:  0.0008
    req_getattr     :  req/s:   83.4,  lat:  0.0000
    req_lookup      :  req/s: 1102.1,  lat:  0.0000
    req_open        :  req/s:    8.3,  lat:  0.0105
    req_readdir     :  req/s:    1.0,  lat:  0.0010
    req_rename      :  req/s:    1.8,  lat:  0.0001
    req_setfilelock :  req/s: 1890.4,  lat:  0.0000
    req_unlink      :  req/s:    2.6,  lat:  0.0001

Has anyone come across this issue or have any hints/suggestions?

Andras

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx