Currently this issue creates a painful problem for us - having gone from
a fairly smoothly working cephfs setup to one that comes to a screeching
halt several times a day for hours when people start removing trees of
files. I've filed a tracker for the problem:
https://tracker.ceph.com/issues/53192 with as much detail as I know
currently. It seems like there is an issue with removes inside the MDS
when removing files that were there before snapshots existed - even
after the snapshots are completely removed. Unfortunately I don't
understand enough about snapshots and how snapshots are treated within
the MDS to come up with any conclusions here.
Andras
On 11/5/21 09:26, Andras Pataki wrote:
Also, just to add a little more detail (the problem is back again
today on our cluster). The cause seems to be a single 'rm -rf' of a
directory on a node. If I stop that process, the latencies
immediately drop, restart it, they immediately go back up by a factor
of 50 to 100 on the MDS. stracing the process shows that some unlinks
are really really slow (0.3s and up), but most are reasonably ok (1ms
or so):
09:23:54.854668 unlinkat(7, "ebdyc.p", 0) = 0 <0.001106>
*09:23:54.855864 unlinkat(7, "Dyy.p", 0) = 0 <0.352603>*
09:23:55.208547 unlinkat(7, "u.p", 0) = 0 <0.002474>
09:23:55.211098 unlinkat(7, "Dxy.p", 0) = 0 <0.001317>
09:23:55.212491 unlinkat(7, "v.p", 0) = 0 <0.001437>
09:23:55.214005 unlinkat(7, "Dxx.p", 0) = 0 <0.001506>
09:23:55.215622 unlinkat(6, "timestep_35050", AT_REMOVEDIR) = 0
<0.001874>
*09:23:55.221117 unlinkat(7, "ebdyc.p", 0) = 0 <0.335875>*
09:23:55.557076 unlinkat(7, "Dyy.p", 0) = 0 <0.001657>
09:23:55.558809 unlinkat(7, "u.p", 0) = 0 <0.001450>
09:23:55.560349 unlinkat(7, "Dxy.p", 0) = 0 <0.001507>
09:23:55.561940 unlinkat(7, "v.p", 0) = 0 <0.001349>
09:23:55.563365 unlinkat(7, "Dxx.p", 0) = 0 <0.001182>
09:23:55.564668 unlinkat(6, "timestep_45250", AT_REMOVEDIR) = 0
<0.002633>
09:23:55.885885 unlinkat(7, "ebdyc.p", 0) = 0 <0.005142>
09:23:55.891111 unlinkat(7, "Dyy.p", 0) = 0 <0.001656>
09:23:55.892844 unlinkat(7, "u.p", 0) = 0 <0.001685>
09:23:55.894589 unlinkat(7, "Dxy.p", 0) = 0 <0.001407>
09:23:55.896056 unlinkat(7, "v.p", 0) = 0 <0.001520>
09:23:55.897669 unlinkat(7, "Dxx.p", 0) = 0 <0.001601>
The client here is ceph-fuse 15.2.11 if that matters.
Andras
On 11/5/21 05:23, Andras Pataki wrote:
Hi Patrick,
Thanks for looking at the issue - over the past few days this has
given us quite a bit of headache unfortunately. A couple of
questions if you don't mind - both for my education and to avoid
hitting this issue until we can upgrade.
Do you think the issue is related to snapshots at all? I started
creating snapshots this week on this cluster, and this problem
started showing up this week as well - and the stack trace has
"snapshotty" things in it like snap realm splitting/opening. Or is
this just a coincidence perhaps?
When is this code path triggered (Server::_unlink_local_finish ->
CInode::early_pop_projected_snaprealm ->
CInode::pop_projected_snaprealm -> CInode::open_snaprealm ->
SnapRealm::split_at)?
If this is indeed related to snapshots - somehow the problem reared
its head more than a day after I removed all snapshots from the
system. Do things from snapshots stay around in the MDS well after
they no longer exist?
There seems to be another issue with kernel clients performing really
poorly after a snapshot has been created (even after it is removed
afterwards). I see a kworker thread like the following constantly
chewing CPU when running metadata operations (like scanning the file
system, rsync or such)
84851 root 20 0 0 0 0 R 100.0 0.0 24:33.07
[kworker/2:0+ceph-msgr]
Nov 5 02:19:14 popeye-mgr-0-10 kernel: Call Trace:
Nov 5 02:19:14 popeye-mgr-0-10 kernel:
rebuild_snap_realms+0x1e/0x70 [ceph]
Nov 5 02:19:14 popeye-mgr-0-10 kernel:
rebuild_snap_realms+0x38/0x70 [ceph]
Nov 5 02:19:14 popeye-mgr-0-10 kernel:
ceph_update_snap_trace+0x213/0x400 [ceph]
Nov 5 02:19:14 popeye-mgr-0-10 kernel: handle_reply+0x209/0x770
[ceph]
Nov 5 02:19:14 popeye-mgr-0-10 kernel: dispatch+0xb2/0x1e0 [ceph]
Nov 5 02:19:14 popeye-mgr-0-10 kernel:
process_message+0x7b/0x130 [libceph]
Nov 5 02:19:14 popeye-mgr-0-10 kernel: try_read+0x340/0x5e0
[libceph]
Nov 5 02:19:14 popeye-mgr-0-10 kernel:
ceph_con_workfn+0x79/0x210 [libceph]
Nov 5 02:19:14 popeye-mgr-0-10 kernel: process_one_work+0x1a6/0x300
Nov 5 02:19:14 popeye-mgr-0-10 kernel: worker_thread+0x41/0x310
I see another message about this on the mailing list from last night
("One cephFS snapshot kills performance"). I've tested various
kernels on this, two LTS ones: 5.4.114, 5.10.73 and the latest
stable: 5.14.16. They all exhibit this behavior.
If you point me to some places to learn about how
snapshots/snaprealms work (especially in conjunction with the MDS) -
that'd be really helpful as well.
Thanks!
Andras
On 11/4/21 11:40, Patrick Donnelly wrote:
Hi Andras,
On Wed, Nov 3, 2021 at 10:18 AM Andras Pataki
<apataki@xxxxxxxxxxxxxxxxxxxxx> wrote:
Hi cephers,
Recently we've started using cephfs snapshots more - and seem to be
running into a rather annoying performance issue with the MDS. The
cluster in question is on Nautilus 14.2.20.
Typically, the MDS processes a few thousand requests per second with all
operations showing latencies in the few millisecond range (in mds perf
dump) and the system seems quite responsive (directory listings, general
metadata operations feel quick). Every so often, the MDS transitions
into a period of super high latency: 0.1 to 2 seconds per operation (as
measured by increases in the latency counters in mds perf dump). During
these high latency periods, the request rate is about the same (low
1000s requests/s) - but one thread of the MDS called 'fn_anonymous' is
100% busy. Pointing the debugger to it and getting a stack trace at
random times always shows a similar picture:
Thanks for the report and useful stack trace. This is probably
corrected by the new use of a "fair" mutex in the MDS:
https://tracker.ceph.com/issues/52441
The fix will be in 16.2.7.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx