Re: High cephfs MDS latency and CPU load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Andras,

I've left a comment on the ticket. Let's continue the discussion there.

On Mon, Nov 8, 2021 at 12:00 PM Andras Pataki
<apataki@xxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Currently this issue creates a painful problem for us - having gone from
> a fairly smoothly working cephfs setup to one that comes to a screeching
> halt several times a day for hours when people start removing trees of
> files.  I've filed a tracker for the problem:
> https://tracker.ceph.com/issues/53192 with as much detail as I know
> currently.  It seems like there is an issue with removes inside the MDS
> when removing files that were there before snapshots existed - even
> after the snapshots are completely removed.  Unfortunately I don't
> understand enough about snapshots and how snapshots are treated within
> the MDS to come up with any conclusions here.
>
> Andras
>
>
> On 11/5/21 09:26, Andras Pataki wrote:
> > Also, just to add a little more detail (the problem is back again
> > today on our cluster).  The cause seems to be a single 'rm -rf' of a
> > directory on a node.  If I stop that process, the latencies
> > immediately drop, restart it, they immediately go back up by a factor
> > of 50 to 100 on the MDS.  stracing the process shows that some unlinks
> > are really really slow (0.3s and up), but most are reasonably ok (1ms
> > or so):
> >
> >     09:23:54.854668 unlinkat(7, "ebdyc.p", 0) = 0 <0.001106>
> >     *09:23:54.855864 unlinkat(7, "Dyy.p", 0) = 0 <0.352603>*
> >     09:23:55.208547 unlinkat(7, "u.p", 0)   = 0 <0.002474>
> >     09:23:55.211098 unlinkat(7, "Dxy.p", 0) = 0 <0.001317>
> >     09:23:55.212491 unlinkat(7, "v.p", 0)   = 0 <0.001437>
> >     09:23:55.214005 unlinkat(7, "Dxx.p", 0) = 0 <0.001506>
> >     09:23:55.215622 unlinkat(6, "timestep_35050", AT_REMOVEDIR) = 0
> >     <0.001874>
> >     *09:23:55.221117 unlinkat(7, "ebdyc.p", 0) = 0 <0.335875>*
> >     09:23:55.557076 unlinkat(7, "Dyy.p", 0) = 0 <0.001657>
> >     09:23:55.558809 unlinkat(7, "u.p", 0)   = 0 <0.001450>
> >     09:23:55.560349 unlinkat(7, "Dxy.p", 0) = 0 <0.001507>
> >     09:23:55.561940 unlinkat(7, "v.p", 0)   = 0 <0.001349>
> >     09:23:55.563365 unlinkat(7, "Dxx.p", 0) = 0 <0.001182>
> >     09:23:55.564668 unlinkat(6, "timestep_45250", AT_REMOVEDIR) = 0
> >     <0.002633>
> >     09:23:55.885885 unlinkat(7, "ebdyc.p", 0) = 0 <0.005142>
> >     09:23:55.891111 unlinkat(7, "Dyy.p", 0) = 0 <0.001656>
> >     09:23:55.892844 unlinkat(7, "u.p", 0)   = 0 <0.001685>
> >     09:23:55.894589 unlinkat(7, "Dxy.p", 0) = 0 <0.001407>
> >     09:23:55.896056 unlinkat(7, "v.p", 0)   = 0 <0.001520>
> >     09:23:55.897669 unlinkat(7, "Dxx.p", 0) = 0 <0.001601>
> >
> > The client here is ceph-fuse 15.2.11 if that matters.
> >
> > Andras
> >
> >
> > On 11/5/21 05:23, Andras Pataki wrote:
> >> Hi Patrick,
> >>
> >> Thanks for looking at the issue - over the past few days this has
> >> given us quite a bit of headache unfortunately.  A couple of
> >> questions if you don't mind - both for my education and to avoid
> >> hitting this issue until we can upgrade.
> >>
> >> Do you think the issue is related to snapshots at all?  I started
> >> creating snapshots this week on this cluster, and this problem
> >> started showing up this week as well - and the stack trace has
> >> "snapshotty" things in it like snap realm splitting/opening.  Or is
> >> this just a coincidence perhaps?
> >>
> >> When is this code path triggered (Server::_unlink_local_finish ->
> >> CInode::early_pop_projected_snaprealm ->
> >> CInode::pop_projected_snaprealm -> CInode::open_snaprealm ->
> >> SnapRealm::split_at)?
> >>
> >> If this is indeed related to snapshots - somehow the problem reared
> >> its head more than a day after I removed all snapshots from the
> >> system.  Do things from snapshots stay around in the MDS well after
> >> they no longer exist?
> >>
> >> There seems to be another issue with kernel clients performing really
> >> poorly after a snapshot has been created (even after it is removed
> >> afterwards).  I see a kworker thread like the following constantly
> >> chewing CPU when running metadata operations (like scanning the file
> >> system, rsync or such)
> >>   84851 root      20   0       0      0      0 R 100.0   0.0 24:33.07
> >> [kworker/2:0+ceph-msgr]
> >>
> >>     Nov  5 02:19:14 popeye-mgr-0-10 kernel: Call Trace:
> >>     Nov  5 02:19:14 popeye-mgr-0-10 kernel:
> >>     rebuild_snap_realms+0x1e/0x70 [ceph]
> >>     Nov  5 02:19:14 popeye-mgr-0-10 kernel:
> >>     rebuild_snap_realms+0x38/0x70 [ceph]
> >>     Nov  5 02:19:14 popeye-mgr-0-10 kernel:
> >>     ceph_update_snap_trace+0x213/0x400 [ceph]
> >>     Nov  5 02:19:14 popeye-mgr-0-10 kernel: handle_reply+0x209/0x770
> >>     [ceph]
> >>     Nov  5 02:19:14 popeye-mgr-0-10 kernel: dispatch+0xb2/0x1e0 [ceph]
> >>     Nov  5 02:19:14 popeye-mgr-0-10 kernel:
> >>     process_message+0x7b/0x130 [libceph]
> >>     Nov  5 02:19:14 popeye-mgr-0-10 kernel: try_read+0x340/0x5e0
> >>     [libceph]
> >>     Nov  5 02:19:14 popeye-mgr-0-10 kernel:
> >>     ceph_con_workfn+0x79/0x210 [libceph]
> >>     Nov  5 02:19:14 popeye-mgr-0-10 kernel: process_one_work+0x1a6/0x300
> >>     Nov  5 02:19:14 popeye-mgr-0-10 kernel: worker_thread+0x41/0x310
> >>
> >> I see another message about this on the mailing list from last night
> >> ("One cephFS snapshot kills performance").  I've tested various
> >> kernels on this, two LTS ones: 5.4.114, 5.10.73 and the latest
> >> stable: 5.14.16.  They all exhibit this behavior.
> >>
> >> If you point me to some places to learn about how
> >> snapshots/snaprealms work (especially in conjunction with the MDS) -
> >> that'd be really helpful as well.
> >>
> >> Thanks!
> >>
> >> Andras
> >>
> >>
> >> On 11/4/21 11:40, Patrick Donnelly wrote:
> >>> Hi Andras,
> >>>
> >>> On Wed, Nov 3, 2021 at 10:18 AM Andras Pataki
> >>> <apataki@xxxxxxxxxxxxxxxxxxxxx>  wrote:
> >>>> Hi cephers,
> >>>>
> >>>> Recently we've started using cephfs snapshots more - and seem to be
> >>>> running into a rather annoying performance issue with the MDS.  The
> >>>> cluster in question is on Nautilus 14.2.20.
> >>>>
> >>>> Typically, the MDS processes a few thousand requests per second with all
> >>>> operations showing latencies in the few millisecond range (in mds perf
> >>>> dump) and the system seems quite responsive (directory listings, general
> >>>> metadata operations feel quick).  Every so often, the MDS transitions
> >>>> into a period of super high latency: 0.1 to 2 seconds per operation (as
> >>>> measured by increases in the latency counters in mds perf dump).  During
> >>>> these high latency periods, the request rate is about the same (low
> >>>> 1000s requests/s) - but one thread of the MDS called 'fn_anonymous' is
> >>>> 100% busy.  Pointing the debugger to it and getting a stack trace at
> >>>> random times always shows a similar picture:
> >>> Thanks for the report and useful stack trace. This is probably
> >>> corrected by the new use of a "fair" mutex in the MDS:
> >>>
> >>> https://tracker.ceph.com/issues/52441
> >>>
> >>> The fix will be in 16.2.7.
> >>>
> >>
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx



-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux