Thanks Andras, I've left a comment on the ticket. Let's continue the discussion there. On Mon, Nov 8, 2021 at 12:00 PM Andras Pataki <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote: > > Currently this issue creates a painful problem for us - having gone from > a fairly smoothly working cephfs setup to one that comes to a screeching > halt several times a day for hours when people start removing trees of > files. I've filed a tracker for the problem: > https://tracker.ceph.com/issues/53192 with as much detail as I know > currently. It seems like there is an issue with removes inside the MDS > when removing files that were there before snapshots existed - even > after the snapshots are completely removed. Unfortunately I don't > understand enough about snapshots and how snapshots are treated within > the MDS to come up with any conclusions here. > > Andras > > > On 11/5/21 09:26, Andras Pataki wrote: > > Also, just to add a little more detail (the problem is back again > > today on our cluster). The cause seems to be a single 'rm -rf' of a > > directory on a node. If I stop that process, the latencies > > immediately drop, restart it, they immediately go back up by a factor > > of 50 to 100 on the MDS. stracing the process shows that some unlinks > > are really really slow (0.3s and up), but most are reasonably ok (1ms > > or so): > > > > 09:23:54.854668 unlinkat(7, "ebdyc.p", 0) = 0 <0.001106> > > *09:23:54.855864 unlinkat(7, "Dyy.p", 0) = 0 <0.352603>* > > 09:23:55.208547 unlinkat(7, "u.p", 0) = 0 <0.002474> > > 09:23:55.211098 unlinkat(7, "Dxy.p", 0) = 0 <0.001317> > > 09:23:55.212491 unlinkat(7, "v.p", 0) = 0 <0.001437> > > 09:23:55.214005 unlinkat(7, "Dxx.p", 0) = 0 <0.001506> > > 09:23:55.215622 unlinkat(6, "timestep_35050", AT_REMOVEDIR) = 0 > > <0.001874> > > *09:23:55.221117 unlinkat(7, "ebdyc.p", 0) = 0 <0.335875>* > > 09:23:55.557076 unlinkat(7, "Dyy.p", 0) = 0 <0.001657> > > 09:23:55.558809 unlinkat(7, "u.p", 0) = 0 <0.001450> > > 09:23:55.560349 unlinkat(7, "Dxy.p", 0) = 0 <0.001507> > > 09:23:55.561940 unlinkat(7, "v.p", 0) = 0 <0.001349> > > 09:23:55.563365 unlinkat(7, "Dxx.p", 0) = 0 <0.001182> > > 09:23:55.564668 unlinkat(6, "timestep_45250", AT_REMOVEDIR) = 0 > > <0.002633> > > 09:23:55.885885 unlinkat(7, "ebdyc.p", 0) = 0 <0.005142> > > 09:23:55.891111 unlinkat(7, "Dyy.p", 0) = 0 <0.001656> > > 09:23:55.892844 unlinkat(7, "u.p", 0) = 0 <0.001685> > > 09:23:55.894589 unlinkat(7, "Dxy.p", 0) = 0 <0.001407> > > 09:23:55.896056 unlinkat(7, "v.p", 0) = 0 <0.001520> > > 09:23:55.897669 unlinkat(7, "Dxx.p", 0) = 0 <0.001601> > > > > The client here is ceph-fuse 15.2.11 if that matters. > > > > Andras > > > > > > On 11/5/21 05:23, Andras Pataki wrote: > >> Hi Patrick, > >> > >> Thanks for looking at the issue - over the past few days this has > >> given us quite a bit of headache unfortunately. A couple of > >> questions if you don't mind - both for my education and to avoid > >> hitting this issue until we can upgrade. > >> > >> Do you think the issue is related to snapshots at all? I started > >> creating snapshots this week on this cluster, and this problem > >> started showing up this week as well - and the stack trace has > >> "snapshotty" things in it like snap realm splitting/opening. Or is > >> this just a coincidence perhaps? > >> > >> When is this code path triggered (Server::_unlink_local_finish -> > >> CInode::early_pop_projected_snaprealm -> > >> CInode::pop_projected_snaprealm -> CInode::open_snaprealm -> > >> SnapRealm::split_at)? > >> > >> If this is indeed related to snapshots - somehow the problem reared > >> its head more than a day after I removed all snapshots from the > >> system. Do things from snapshots stay around in the MDS well after > >> they no longer exist? > >> > >> There seems to be another issue with kernel clients performing really > >> poorly after a snapshot has been created (even after it is removed > >> afterwards). I see a kworker thread like the following constantly > >> chewing CPU when running metadata operations (like scanning the file > >> system, rsync or such) > >> 84851 root 20 0 0 0 0 R 100.0 0.0 24:33.07 > >> [kworker/2:0+ceph-msgr] > >> > >> Nov 5 02:19:14 popeye-mgr-0-10 kernel: Call Trace: > >> Nov 5 02:19:14 popeye-mgr-0-10 kernel: > >> rebuild_snap_realms+0x1e/0x70 [ceph] > >> Nov 5 02:19:14 popeye-mgr-0-10 kernel: > >> rebuild_snap_realms+0x38/0x70 [ceph] > >> Nov 5 02:19:14 popeye-mgr-0-10 kernel: > >> ceph_update_snap_trace+0x213/0x400 [ceph] > >> Nov 5 02:19:14 popeye-mgr-0-10 kernel: handle_reply+0x209/0x770 > >> [ceph] > >> Nov 5 02:19:14 popeye-mgr-0-10 kernel: dispatch+0xb2/0x1e0 [ceph] > >> Nov 5 02:19:14 popeye-mgr-0-10 kernel: > >> process_message+0x7b/0x130 [libceph] > >> Nov 5 02:19:14 popeye-mgr-0-10 kernel: try_read+0x340/0x5e0 > >> [libceph] > >> Nov 5 02:19:14 popeye-mgr-0-10 kernel: > >> ceph_con_workfn+0x79/0x210 [libceph] > >> Nov 5 02:19:14 popeye-mgr-0-10 kernel: process_one_work+0x1a6/0x300 > >> Nov 5 02:19:14 popeye-mgr-0-10 kernel: worker_thread+0x41/0x310 > >> > >> I see another message about this on the mailing list from last night > >> ("One cephFS snapshot kills performance"). I've tested various > >> kernels on this, two LTS ones: 5.4.114, 5.10.73 and the latest > >> stable: 5.14.16. They all exhibit this behavior. > >> > >> If you point me to some places to learn about how > >> snapshots/snaprealms work (especially in conjunction with the MDS) - > >> that'd be really helpful as well. > >> > >> Thanks! > >> > >> Andras > >> > >> > >> On 11/4/21 11:40, Patrick Donnelly wrote: > >>> Hi Andras, > >>> > >>> On Wed, Nov 3, 2021 at 10:18 AM Andras Pataki > >>> <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote: > >>>> Hi cephers, > >>>> > >>>> Recently we've started using cephfs snapshots more - and seem to be > >>>> running into a rather annoying performance issue with the MDS. The > >>>> cluster in question is on Nautilus 14.2.20. > >>>> > >>>> Typically, the MDS processes a few thousand requests per second with all > >>>> operations showing latencies in the few millisecond range (in mds perf > >>>> dump) and the system seems quite responsive (directory listings, general > >>>> metadata operations feel quick). Every so often, the MDS transitions > >>>> into a period of super high latency: 0.1 to 2 seconds per operation (as > >>>> measured by increases in the latency counters in mds perf dump). During > >>>> these high latency periods, the request rate is about the same (low > >>>> 1000s requests/s) - but one thread of the MDS called 'fn_anonymous' is > >>>> 100% busy. Pointing the debugger to it and getting a stack trace at > >>>> random times always shows a similar picture: > >>> Thanks for the report and useful stack trace. This is probably > >>> corrected by the new use of a "fair" mutex in the MDS: > >>> > >>> https://tracker.ceph.com/issues/52441 > >>> > >>> The fix will be in 16.2.7. > >>> > >> > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Patrick Donnelly, Ph.D. He / Him / His Principal Software Engineer Red Hat, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx