Re: kernel client osdc ops stuck and mds slow reqs

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Mon, 3 Feb 2020 12:06:05 +0100

On Mon, Feb 3, 2020 at 11:50 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>
> On Mon, Feb 3, 2020 at 10:38 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> >
> > On Fri, Jan 31, 2020 at 6:32 PM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> > >
> > > On Fri, Jan 31, 2020 at 4:57 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> > > >
> > > > Hi Ilya,
> > > >
> > > > On Fri, Jan 31, 2020 at 11:33 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> > > > >
> > > > > On Fri, Jan 31, 2020 at 11:06 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > We are quite regularly (a couple times per week) seeing:
> > > > > >
> > > > > > HEALTH_WARN 1 clients failing to respond to capability release; 1 MDSs
> > > > > > report slow requests
> > > > > > MDS_CLIENT_LATE_RELEASE 1 clients failing to respond to capability release
> > > > > >     mdshpc-be143(mds.0): Client hpc-be028.cern.ch: failing to respond
> > > > > > to capability release client_id: 52919162
> > > > > > MDS_SLOW_REQUEST 1 MDSs report slow requests
> > > > > >     mdshpc-be143(mds.0): 1 slow requests are blocked > 30 secs
> > > > > >
> > > > > > Which is being caused by osdc ops stuck in a kernel client, e.g.:
> > > > > >
> > > > > > 10:57:18 root hpc-be028 /root
> > > > > > → cat /sys/kernel/debug/ceph/4da6fd06-b069-49af-901f-c9513baabdbd.client52919162/osdc
> > > > > > REQUESTS 9 homeless 0
> > > > > > 46559317    osd243    3.ee6ffcdb    3.cdb    [243,501,92]/243
> > > > > > [243,501,92]/243    e678697
> > > > > > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a01.00000057
> > > > > >  0x400014    1    read
> > > > > > 46559322    osd243    3.ee6ffcdb    3.cdb    [243,501,92]/243
> > > > > > [243,501,92]/243    e678697
> > > > > > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a01.00000057
> > > > > >  0x400014    1    read
> > > > > > 46559323    osd243    3.969cc573    3.573    [243,330,226]/243
> > > > > > [243,330,226]/243    e678697
> > > > > > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056
> > > > > >  0x400014    1    read
> > > > > > 46559341    osd243    3.969cc573    3.573    [243,330,226]/243
> > > > > > [243,330,226]/243    e678697
> > > > > > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056
> > > > > >  0x400014    1    read
> > > > > > 46559342    osd243    3.969cc573    3.573    [243,330,226]/243
> > > > > > [243,330,226]/243    e678697
> > > > > > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056
> > > > > >  0x400014    1    read
> > > > > > 46559345    osd243    3.969cc573    3.573    [243,330,226]/243
> > > > > > [243,330,226]/243    e678697
> > > > > > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056
> > > > > >  0x400014    1    read
> > > > > > 46559621    osd243    3.6313e8ef    3.8ef    [243,330,521]/243
> > > > > > [243,330,521]/243    e678697
> > > > > > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a45.0000007a
> > > > > >  0x400014    1    read
> > > > > > 46559629    osd243    3.b280c852    3.852    [243,113,539]/243
> > > > > > [243,113,539]/243    e678697
> > > > > > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a3a.0000007f
> > > > > >  0x400014    1    read
> > > > > > 46559928    osd243    3.1ee7bab4    3.ab4    [243,332,94]/243
> > > > > > [243,332,94]/243    e678697
> > > > > > fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f099ff.0000073f
> > > > > >  0x400024    1    write
> > > > > > LINGER REQUESTS
> > > > > > BACKOFFS
> > > > > >
> > > > > >
> > > > > > We can unblock those requests by doing `ceph osd down osd.243` (or
> > > > > > restarting osd.243).
> > > > > >
> > > > > > This is ceph v14.2.6 and the client kernel is el7 3.10.0-957.27.2.el7.x86_64.
> > > > > >
> > > > > > Are there a better way to debug this?
> > > > >
> > > > > Hi Dan,
> > > > >
> > > > > I assume that these ops don't show up as slow requests on the OSD side?
> > > > > How long did you see it stuck for before intervening?
> > > >
> > > > That's correct -- the osd had no active ops (ceph daemon.... ops).
> > > >
> > > > The late release slow req was stuck for 4129s before we intervened.
> > > >
> > > > > Do you happen to have "debug ms = 1" logs from osd243?
> > > >
> > > > Nope, but I can try to get it afterwards next time. (Though you need
> > > > it at the moment the ops get stuck, not only from the moment we notice
> > > > the stuck ops, right?)
> > >
> > > Yes, starting before the moment the ops get stuck and ending after you
> > > kick the OSD.
> > >
> > > >
> > > > > Do you have PG autoscaler enabled?  Any PG splits and/or merges at the time?
> > > >
> > > > Not on the cephfs_(meta)data pools (though on the 30th I increased
> > > > those pool sizes from 2 to 3). And also on the 30th I did some PG
> > > > merging on an unrelated test pool.
> > > > And anyway we have seen this type of lockup in the past, without those
> > > > pool changes (also with mimic MDS until we upgraded to nautilus).
> > >
> > > The MDS is out of question here.  This issue is between the kernel
> > > client and the OSD.
> > >
> > > >
> > > > Looking back further in the client's kernel log we see a page alloc
> > > > failure on the 30th:
> > > >
> > > > Jan 30 16:16:35 hpc-be028.cern.ch kernel: kworker/1:36: page
> > > > allocation failure: order:5, mode:0x104050
> > > > Jan 30 16:16:35 hpc-be028.cern.ch kernel: CPU: 1 PID: 78445 Comm:
> > > > kworker/1:36 Kdump: loaded Tainted: P
> > > > Jan 30 16:16:35 hpc-be028.cern.ch kernel: Workqueue: ceph-msgr
> > > > ceph_con_workfn [libceph]
> > >
> > > Can you share the stack trace?  That's a 128k allocation, so worth
> > > taking a look.
> > >
> >
> > Pasted here: https://pastebin.com/neyah54k
>
> This should be fixed with
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=10c12851a022662bf6085bd4384b4ebed4c447ce
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cf73d882cc51c1f245a890cccb79952a260302d3
>
> in 5.4.

Ahh thanks. We'll open a ticket to RH for a backport to el7.x.

>
> > The same node had a lockup again last night.
>
> Same as before?  Mitigated by a restart of a single OSD?

This time, the user noticed IO was stuck. (Those cp in the paste.)
There were no slow reqs on the MDS or elsewhere in the cluster.
Anyway, the stuck client had old osdc ops on osd.244, so we did `ceph
osd down osd.244` and the client was un-stuck.

Cheers, Dan

> Thanks,
>
>                 Ilya
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx