Re: High cephfs MDS latency and CPU load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Currently this issue creates a painful problem for us - having gone from a fairly smoothly working cephfs setup to one that comes to a screeching halt several times a day for hours when people start removing trees of files.  I've filed a tracker for the problem: https://tracker.ceph.com/issues/53192 with as much detail as I know currently.  It seems like there is an issue with removes inside the MDS when removing files that were there before snapshots existed - even after the snapshots are completely removed.  Unfortunately I don't understand enough about snapshots and how snapshots are treated within the MDS to come up with any conclusions here.

Andras


On 11/5/21 09:26, Andras Pataki wrote:
Also, just to add a little more detail (the problem is back again today on our cluster).  The cause seems to be a single 'rm -rf' of a directory on a node.  If I stop that process, the latencies immediately drop, restart it, they immediately go back up by a factor of 50 to 100 on the MDS.  stracing the process shows that some unlinks are really really slow (0.3s and up), but most are reasonably ok (1ms or so):

    09:23:54.854668 unlinkat(7, "ebdyc.p", 0) = 0 <0.001106>
    *09:23:54.855864 unlinkat(7, "Dyy.p", 0) = 0 <0.352603>*
    09:23:55.208547 unlinkat(7, "u.p", 0)   = 0 <0.002474>
    09:23:55.211098 unlinkat(7, "Dxy.p", 0) = 0 <0.001317>
    09:23:55.212491 unlinkat(7, "v.p", 0)   = 0 <0.001437>
    09:23:55.214005 unlinkat(7, "Dxx.p", 0) = 0 <0.001506>
    09:23:55.215622 unlinkat(6, "timestep_35050", AT_REMOVEDIR) = 0
    <0.001874>
    *09:23:55.221117 unlinkat(7, "ebdyc.p", 0) = 0 <0.335875>*
    09:23:55.557076 unlinkat(7, "Dyy.p", 0) = 0 <0.001657>
    09:23:55.558809 unlinkat(7, "u.p", 0)   = 0 <0.001450>
    09:23:55.560349 unlinkat(7, "Dxy.p", 0) = 0 <0.001507>
    09:23:55.561940 unlinkat(7, "v.p", 0)   = 0 <0.001349>
    09:23:55.563365 unlinkat(7, "Dxx.p", 0) = 0 <0.001182>
    09:23:55.564668 unlinkat(6, "timestep_45250", AT_REMOVEDIR) = 0
    <0.002633>
    09:23:55.885885 unlinkat(7, "ebdyc.p", 0) = 0 <0.005142>
    09:23:55.891111 unlinkat(7, "Dyy.p", 0) = 0 <0.001656>
    09:23:55.892844 unlinkat(7, "u.p", 0)   = 0 <0.001685>
    09:23:55.894589 unlinkat(7, "Dxy.p", 0) = 0 <0.001407>
    09:23:55.896056 unlinkat(7, "v.p", 0)   = 0 <0.001520>
    09:23:55.897669 unlinkat(7, "Dxx.p", 0) = 0 <0.001601>

The client here is ceph-fuse 15.2.11 if that matters.

Andras


On 11/5/21 05:23, Andras Pataki wrote:
Hi Patrick,

Thanks for looking at the issue - over the past few days this has given us quite a bit of headache unfortunately.  A couple of questions if you don't mind - both for my education and to avoid hitting this issue until we can upgrade.

Do you think the issue is related to snapshots at all?  I started creating snapshots this week on this cluster, and this problem started showing up this week as well - and the stack trace has "snapshotty" things in it like snap realm splitting/opening.  Or is this just a coincidence perhaps?

When is this code path triggered (Server::_unlink_local_finish -> CInode::early_pop_projected_snaprealm -> CInode::pop_projected_snaprealm -> CInode::open_snaprealm -> SnapRealm::split_at)?

If this is indeed related to snapshots - somehow the problem reared its head more than a day after I removed all snapshots from the system.  Do things from snapshots stay around in the MDS well after they no longer exist?

There seems to be another issue with kernel clients performing really poorly after a snapshot has been created (even after it is removed afterwards).  I see a kworker thread like the following constantly chewing CPU when running metadata operations (like scanning the file system, rsync or such)   84851 root      20   0       0      0      0 R 100.0   0.0 24:33.07 [kworker/2:0+ceph-msgr]

    Nov  5 02:19:14 popeye-mgr-0-10 kernel: Call Trace:
    Nov  5 02:19:14 popeye-mgr-0-10 kernel:
    rebuild_snap_realms+0x1e/0x70 [ceph]
    Nov  5 02:19:14 popeye-mgr-0-10 kernel:
    rebuild_snap_realms+0x38/0x70 [ceph]
    Nov  5 02:19:14 popeye-mgr-0-10 kernel:
    ceph_update_snap_trace+0x213/0x400 [ceph]
    Nov  5 02:19:14 popeye-mgr-0-10 kernel: handle_reply+0x209/0x770
    [ceph]
    Nov  5 02:19:14 popeye-mgr-0-10 kernel: dispatch+0xb2/0x1e0 [ceph]
    Nov  5 02:19:14 popeye-mgr-0-10 kernel:
    process_message+0x7b/0x130 [libceph]
    Nov  5 02:19:14 popeye-mgr-0-10 kernel: try_read+0x340/0x5e0
    [libceph]
    Nov  5 02:19:14 popeye-mgr-0-10 kernel:
    ceph_con_workfn+0x79/0x210 [libceph]
    Nov  5 02:19:14 popeye-mgr-0-10 kernel: process_one_work+0x1a6/0x300
    Nov  5 02:19:14 popeye-mgr-0-10 kernel: worker_thread+0x41/0x310

I see another message about this on the mailing list from last night ("One cephFS snapshot kills performance").  I've tested various kernels on this, two LTS ones: 5.4.114, 5.10.73 and the latest stable: 5.14.16.  They all exhibit this behavior.

If you point me to some places to learn about how snapshots/snaprealms work (especially in conjunction with the MDS) - that'd be really helpful as well.

Thanks!

Andras


On 11/4/21 11:40, Patrick Donnelly wrote:
Hi Andras,

On Wed, Nov 3, 2021 at 10:18 AM Andras Pataki
<apataki@xxxxxxxxxxxxxxxxxxxxx>  wrote:
Hi cephers,

Recently we've started using cephfs snapshots more - and seem to be
running into a rather annoying performance issue with the MDS.  The
cluster in question is on Nautilus 14.2.20.

Typically, the MDS processes a few thousand requests per second with all
operations showing latencies in the few millisecond range (in mds perf
dump) and the system seems quite responsive (directory listings, general
metadata operations feel quick).  Every so often, the MDS transitions
into a period of super high latency: 0.1 to 2 seconds per operation (as
measured by increases in the latency counters in mds perf dump).  During
these high latency periods, the request rate is about the same (low
1000s requests/s) - but one thread of the MDS called 'fn_anonymous' is
100% busy.  Pointing the debugger to it and getting a stack trace at
random times always shows a similar picture:
Thanks for the report and useful stack trace. This is probably
corrected by the new use of a "fair" mutex in the MDS:

https://tracker.ceph.com/issues/52441

The fix will be in 16.2.7.



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux