Re: Kernel mounted RBD's hanging

Nick Fisk <nick@xxxxxxxxxx> · Wed, 12 Jul 2017 18:15:53 +0100

> -----Original Message-----
> From: Nick Fisk [mailto:nick@xxxxxxxxxx]
> Sent: 12 July 2017 13:47
> To: 'Ilya Dryomov' <idryomov@xxxxxxxxx>
> Cc: 'Ceph Users' <ceph-users@xxxxxxxxxxxxxx>
> Subject: RE:  Kernel mounted RBD's hanging
> 
> > -----Original Message-----
> > From: Nick Fisk [mailto:nick@xxxxxxxxxx]
> > Sent: 08 July 2017 21:50
> > To: 'Ilya Dryomov' <idryomov@xxxxxxxxx>
> > Cc: 'Ceph Users' <ceph-users@xxxxxxxxxxxxxx>
> > Subject: RE:  Kernel mounted RBD's hanging
> >
> > > -----Original Message-----
> > > From: Ilya Dryomov [mailto:idryomov@xxxxxxxxx]
> > > Sent: 07 July 2017 11:32
> > > To: Nick Fisk <nick@xxxxxxxxxx>
> > > Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
> > > Subject: Re:  Kernel mounted RBD's hanging
> > >
> > > On Fri, Jul 7, 2017 at 12:10 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> > > > Managed to catch another one, osd.75 again, not sure if that is an
> > > indication of anything or just a co-incidence. osd.75 is one of 8
> > > OSD's in a cache tier, so all IO will be funnelled through them.
> > > >
> > > >
> > > >
> > > > Also found this in the log of osd.75 at the same time, but the
> > > > client IP is not
> > > the same as the node which experienced the hang.
> > >
> > > Can you bump debug_ms and debug_osd to 30 on osd75?  I doubt it's an
> > > issue with that particular OSD, but if it goes down the same way
> > > again, I'd have something to look at.  Make sure logrotate is
> > > configured and working before doing that though... ;)
> > >
> > > Thanks,
> > >
> > >                 Ilya
> >
> > So, osd.75 was a coincidence, several other hangs have had outstanding
> > requests to other OSD's. I haven't been able to get the debug logs of
> > the OSD during a hang yet because of this. Although I think the crc problem may now be fixed, by upgrading all clients to 4.11.1+.
> >
> > Here is a series of osdc dumps every minute during one of the hangs
> > with a different target OSD. The osdc dumps on another node show IO
> > being processed normally whilst the other node hangs, so the cluster
> > is definitely handling IO fine whilst the other node hangs. And as I am using cache tiering with proxying, all IO will be going through
> just 8 OSD's. The host has 3 RBD's mounted and all 3 hang.
> >
> > Latest hang:
> > Sat  8 Jul 18:49:01 BST 2017
> > REQUESTS 4 homeless 0
> > 174662831       osd25   17.77737285     [25,74,14]/25   [25,74,14]/25   rbd_data.15d8670238e1f29.00000000000cf9f8       0x400024        1
> > 0'0     set-alloc-hint,write
> > 174662863       osd25   17.7b91a345     [25,74,14]/25   [25,74,14]/25   rbd_data.1555406238e1f29.000000000002571c       0x400024        1
> > 0'0     set-alloc-hint,write
> > 174662887       osd25   17.6c2eaa93     [25,75,14]/25   [25,75,14]/25   rbd_data.158f204238e1f29.0000000000000008       0x400024        1
> > 0'0     set-alloc-hint,write
> > 174662925       osd25   17.32271445     [25,74,14]/25   [25,74,14]/25   rbd_data.1555406238e1f29.0000000000000001       0x400024        1
> > 0'0     set-alloc-hint,write
> > LINGER REQUESTS
> > 18446462598732840990    osd74   17.145baa0f     [74,72,14]/74   [74,72,14]/74   rbd_header.158f204238e1f29      0x20    8       WC/0
> > 18446462598732840991    osd74   17.7b4e2a06     [74,72,25]/74   [74,72,25]/74   rbd_header.1555406238e1f29      0x20    9       WC/0
> > 18446462598732840992    osd74   17.eea94d58     [74,73,25]/74   [74,73,25]/74   rbd_header.15d8670238e1f29      0x20    8       WC/0
> > Sat  8 Jul 18:50:01 BST 2017
> > REQUESTS 5 homeless 0
> > 174662831       osd25   17.77737285     [25,74,14]/25   [25,74,14]/25   rbd_data.15d8670238e1f29.00000000000cf9f8       0x400024        1
> > 0'0     set-alloc-hint,write
> > 174662863       osd25   17.7b91a345     [25,74,14]/25   [25,74,14]/25   rbd_data.1555406238e1f29.000000000002571c       0x400024        1
> > 0'0     set-alloc-hint,write
> > 174662887       osd25   17.6c2eaa93     [25,75,14]/25   [25,75,14]/25   rbd_data.158f204238e1f29.0000000000000008       0x400024        1
> > 0'0     set-alloc-hint,write
> > 174662925       osd25   17.32271445     [25,74,14]/25   [25,74,14]/25   rbd_data.1555406238e1f29.0000000000000001       0x400024        1
> > 0'0     set-alloc-hint,write
> > 174663129       osd25   17.32271445     [25,74,14]/25   [25,74,14]/25   rbd_data.1555406238e1f29.0000000000000001       0x400024        1
> > 0'0     set-alloc-hint,write
> > LINGER REQUESTS
> > 18446462598732840990    osd74   17.145baa0f     [74,72,14]/74   [74,72,14]/74   rbd_header.158f204238e1f29      0x20    8       WC/0
> > 18446462598732840991    osd74   17.7b4e2a06     [74,72,25]/74   [74,72,25]/74   rbd_header.1555406238e1f29      0x20    9       WC/0
> > 18446462598732840992    osd74   17.eea94d58     [74,73,25]/74   [74,73,25]/74   rbd_header.15d8670238e1f29      0x20    8       WC/0
> > Sat  8 Jul 18:51:01 BST 2017
> > REQUESTS 5 homeless 0
> > 174662831       osd25   17.77737285     [25,74,14]/25   [25,74,14]/25   rbd_data.15d8670238e1f29.00000000000cf9f8       0x400024        1
> > 0'0     set-alloc-hint,write
> > 174662863       osd25   17.7b91a345     [25,74,14]/25   [25,74,14]/25   rbd_data.1555406238e1f29.000000000002571c       0x400024        1
> > 0'0     set-alloc-hint,write
> > 174662887       osd25   17.6c2eaa93     [25,75,14]/25   [25,75,14]/25   rbd_data.158f204238e1f29.0000000000000008       0x400024        1
> > 0'0     set-alloc-hint,write
> > 174662925       osd25   17.32271445     [25,74,14]/25   [25,74,14]/25   rbd_data.1555406238e1f29.0000000000000001       0x400024        1
> > 0'0     set-alloc-hint,write
> > 174663129       osd25   17.32271445     [25,74,14]/25   [25,74,14]/25   rbd_data.1555406238e1f29.0000000000000001       0x400024        1
> > 0'0     set-alloc-hint,write
> > LINGER REQUESTS
> > 18446462598732840990    osd74   17.145baa0f     [74,72,14]/74   [74,72,14]/74   rbd_header.158f204238e1f29      0x20    8       WC/0
> > 18446462598732840991    osd74   17.7b4e2a06     [74,72,25]/74   [74,72,25]/74   rbd_header.1555406238e1f29      0x20    9       WC/0
> > 18446462598732840992    osd74   17.eea94d58     [74,73,25]/74   [74,73,25]/74   rbd_header.15d8670238e1f29      0x20    8       WC/0
> > Sat  8 Jul 18:52:01 BST 2017
> > REQUESTS 6 homeless 0
> > 174662831       osd25   17.77737285     [25,74,14]/25   [25,74,14]/25   rbd_data.15d8670238e1f29.00000000000cf9f8       0x400024        1
> > 0'0     set-alloc-hint,write
> > 174662863       osd25   17.7b91a345     [25,74,14]/25   [25,74,14]/25   rbd_data.1555406238e1f29.000000000002571c       0x400024        1
> > 0'0     set-alloc-hint,write
> > 174662887       osd25   17.6c2eaa93     [25,75,14]/25   [25,75,14]/25   rbd_data.158f204238e1f29.0000000000000008       0x400024        1
> > 0'0     set-alloc-hint,write
> > 174662925       osd25   17.32271445     [25,74,14]/25   [25,74,14]/25   rbd_data.1555406238e1f29.0000000000000001       0x400024        1
> > 0'0     set-alloc-hint,write
> > 174663129       osd25   17.32271445     [25,74,14]/25   [25,74,14]/25   rbd_data.1555406238e1f29.0000000000000001       0x400024        1
> > 0'0     set-alloc-hint,write
> > 174664149       osd25   17.b148df13     [25,75,14]/25   [25,75,14]/25   rbd_data.158f204238e1f29.0000000000091205       0x400024        1
> > 0'0     set-alloc-hint,write
> > LINGER REQUESTS
> > 18446462598732840990    osd74   17.145baa0f     [74,72,14]/74   [74,72,14]/74   rbd_header.158f204238e1f29      0x20    8       WC/0
> > 18446462598732840991    osd74   17.7b4e2a06     [74,72,25]/74   [74,72,25]/74   rbd_header.1555406238e1f29      0x20    9       WC/0
> > 18446462598732840992    osd74   17.eea94d58     [74,73,25]/74   [74,73,25]/74   rbd_header.15d8670238e1f29      0x20    8       WC/0
> >
> > And continues on identically until 19:03
> >
> > I realize at this stage, these reports are probably not revealing much
> > more information, so I will report back if I can gather any further
> > information from the OSD's. The problem does seem to be related to load or at least the number of RBD's mounted. The host that
> only has 2 RBD's mounted hardly experiences this problem at all.
> >
> > Nick
> 
> 
> Hi Ilya,
> 
> I have managed today to capture the kernel logs with debugging turned on and the ms+osd debug logs from the mentioned OSD.
> However, this is from a few minutes after the stall starts, not before. The very random nature of the stalls have made it difficult to
> have debug logging on for extended periods of time. I hope there is something in here which might give a clue to what is happening,
> otherwise I will continue to try and capture debugs from before the stall occurs.
> 
> The kernel logs and OSD logs are attached as a zip.
> 
> Sample of the hung requests
> Wed 12 Jul 11:28:01 BST 2017
> REQUESTS 8 homeless 0
> 11457738        osd37   17.be732844     [37,74,14]/37   [37,74,14]/37   rbd_data.15d8670238e1f29.00000000001a2068       0x400024        1
> 0'0     set-alloc-hint,write
> 11457759        osd37   17.9e3d4404     [37,74,14]/37   [37,74,14]/37   rbd_data.15d8670238e1f29.0000000000118149       0x400024        1
> 0'0     set-alloc-hint,write
> 11457770        osd37   17.86ec0d14     [37,72,74]/37   [37,72,74]/37   rbd_data.15d8670238e1f29.000000000006665c       0x400024        1
> 0'0     set-alloc-hint,write
> 11457818        osd37   17.e80ed1c0     [37,75,14]/37   [37,75,14]/37   rbd_data.158f204238e1f29.00000000000d7f1e       0x400014        1
> 0'0     read
> 11457822        osd37   17.9db0a684     [37,74,14]/37   [37,74,14]/37   rbd_data.15d8670238e1f29.000000000000211c       0x400024        1
> 0'0     set-alloc-hint,write
> 11457916        osd37   17.1848293b     [37,3,73]/37    [37,3,73]/37    rbd_data.158f204238e1f29.00000000000d7f8e       0x400024        1       0'0
> set-alloc-hint,write
> 11457967        osd37   17.56b0f4c0     [37,75,14]/37   [37,75,14]/37   rbd_data.158f204238e1f29.0000000000047ffc       0x400014        1       0'0
> read
> 11457970        osd37   17.65ad6d40     [37,75,14]/37   [37,75,14]/37   rbd_data.15d8670238e1f29.0000000000060318       0x400024        1
> 0'0     set-alloc-hint,write
> 
> Also a nice hung_task_timeout
> [57154.424300] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [57154.424331] nfsd            D    0  1650      2 0x00000000
> [57154.424334] Call Trace:
> [57154.424341]  __schedule+0x3c6/0x8c0
> [57154.424344]  schedule+0x36/0x80
> [57154.424346]  rwsem_down_write_failed+0x230/0x3a0
> [57154.424389]  ? xfs_file_buffered_aio_write+0x68/0x270 [xfs] [57154.424392]  call_rwsem_down_write_failed+0x17/0x30
> [57154.424394]  ? call_rwsem_down_write_failed+0x17/0x30
> [57154.424396]  down_write+0x2d/0x40
> [57154.424422]  xfs_ilock+0xa7/0x110 [xfs] [57154.424446]  xfs_file_buffered_aio_write+0x68/0x270 [xfs] [57154.424449]  ?
> iput+0x8a/0x230 [57154.424450]  ? __check_object_size+0x100/0x19d [57154.424474]  xfs_file_write_iter+0x103/0x150 [xfs]
> [57154.424482]  __do_readv_writev+0x2fb/0x390 [57154.424484]  do_readv_writev+0x8d/0xe0 [57154.424487]  ?
> security_file_open+0x8a/0x90 [57154.424488]  ? do_dentry_open+0x27a/0x310 [57154.424511]  ?
> xfs_extent_busy_ag_cmp+0x20/0x20 [xfs] [57154.424513]  vfs_writev+0x3c/0x50 [57154.424514]  ? vfs_writev+0x3c/0x50
> [57154.424525]  nfsd_vfs_write+0xc6/0x3a0 [nfsd] [57154.424531]  nfsd_write+0x144/0x200 [nfsd] [57154.424538]
> nfsd3_proc_write+0xaa/0x140 [nfsd] [57154.424549]  nfsd_dispatch+0xc8/0x260 [nfsd] [57154.424566]
> svc_process_common+0x374/0x6b0 [sunrpc] [57154.424575]  svc_process+0xfe/0x1b0 [sunrpc] [57154.424581]  nfsd+0xe9/0x150
> [nfsd] [57154.424583]  kthread+0x109/0x140 [57154.424588]  ? nfsd_destroy+0x60/0x60 [nfsd] [57154.424590]  ?
> kthread_create_on_node+0x70/0x70 [57154.424592]  ret_from_fork+0x2c/0x40
> 
> Regards,
> Nick
> 
> 

Just looking into this some more, I have found laggy requests to osd37 in the kernel logs:

Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685559] libceph:   req ffff9adad97a1450 tid 11457738 on osd37 is laggy
Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685560] libceph:   req ffff9adad0df9040 tid 11457759 on osd37 is laggy
Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685561] libceph:   req ffff9adad97a5d70 tid 11457770 on osd37 is laggy
Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685562] libceph:   req ffff9adad9036590 tid 11457818 on osd37 is laggy
Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685563] libceph:   req ffff9adad9035960 tid 11457822 on osd37 is laggy
Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685564] libceph:   req ffff9adad9036db0 tid 11457916 on osd37 is laggy
Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685565] libceph:   req ffff9adad90330c0 tid 11457967 on osd37 is laggy
Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685565] libceph:   req ffff9adad97a5550 tid 11457970 on osd37 is laggy

And have found in the kernel client where it displays that message.

https://github.com/ceph/ceph-client/blob/master/net/ceph/osd_client.c#L3073-L3098

Does that suggest for some reason the TCP connection is getting broken and causing the hang? For whatever reason it can't re-establish, or takes longer than it should? And thus causes the outage?

Nick

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com