> -----Original Message----- > From: Nick Fisk [mailto:nick@xxxxxxxxxx] > Sent: 12 July 2017 13:47 > To: 'Ilya Dryomov' <idryomov@xxxxxxxxx> > Cc: 'Ceph Users' <ceph-users@xxxxxxxxxxxxxx> > Subject: RE: Kernel mounted RBD's hanging > > > -----Original Message----- > > From: Nick Fisk [mailto:nick@xxxxxxxxxx] > > Sent: 08 July 2017 21:50 > > To: 'Ilya Dryomov' <idryomov@xxxxxxxxx> > > Cc: 'Ceph Users' <ceph-users@xxxxxxxxxxxxxx> > > Subject: RE: Kernel mounted RBD's hanging > > > > > -----Original Message----- > > > From: Ilya Dryomov [mailto:idryomov@xxxxxxxxx] > > > Sent: 07 July 2017 11:32 > > > To: Nick Fisk <nick@xxxxxxxxxx> > > > Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx> > > > Subject: Re: Kernel mounted RBD's hanging > > > > > > On Fri, Jul 7, 2017 at 12:10 PM, Nick Fisk <nick@xxxxxxxxxx> wrote: > > > > Managed to catch another one, osd.75 again, not sure if that is an > > > indication of anything or just a co-incidence. osd.75 is one of 8 > > > OSD's in a cache tier, so all IO will be funnelled through them. > > > > > > > > > > > > > > > > Also found this in the log of osd.75 at the same time, but the > > > > client IP is not > > > the same as the node which experienced the hang. > > > > > > Can you bump debug_ms and debug_osd to 30 on osd75? I doubt it's an > > > issue with that particular OSD, but if it goes down the same way > > > again, I'd have something to look at. Make sure logrotate is > > > configured and working before doing that though... ;) > > > > > > Thanks, > > > > > > Ilya > > > > So, osd.75 was a coincidence, several other hangs have had outstanding > > requests to other OSD's. I haven't been able to get the debug logs of > > the OSD during a hang yet because of this. Although I think the crc problem may now be fixed, by upgrading all clients to 4.11.1+. > > > > Here is a series of osdc dumps every minute during one of the hangs > > with a different target OSD. The osdc dumps on another node show IO > > being processed normally whilst the other node hangs, so the cluster > > is definitely handling IO fine whilst the other node hangs. And as I am using cache tiering with proxying, all IO will be going through > just 8 OSD's. The host has 3 RBD's mounted and all 3 hang. > > > > Latest hang: > > Sat 8 Jul 18:49:01 BST 2017 > > REQUESTS 4 homeless 0 > > 174662831 osd25 17.77737285 [25,74,14]/25 [25,74,14]/25 rbd_data.15d8670238e1f29.00000000000cf9f8 0x400024 1 > > 0'0 set-alloc-hint,write > > 174662863 osd25 17.7b91a345 [25,74,14]/25 [25,74,14]/25 rbd_data.1555406238e1f29.000000000002571c 0x400024 1 > > 0'0 set-alloc-hint,write > > 174662887 osd25 17.6c2eaa93 [25,75,14]/25 [25,75,14]/25 rbd_data.158f204238e1f29.0000000000000008 0x400024 1 > > 0'0 set-alloc-hint,write > > 174662925 osd25 17.32271445 [25,74,14]/25 [25,74,14]/25 rbd_data.1555406238e1f29.0000000000000001 0x400024 1 > > 0'0 set-alloc-hint,write > > LINGER REQUESTS > > 18446462598732840990 osd74 17.145baa0f [74,72,14]/74 [74,72,14]/74 rbd_header.158f204238e1f29 0x20 8 WC/0 > > 18446462598732840991 osd74 17.7b4e2a06 [74,72,25]/74 [74,72,25]/74 rbd_header.1555406238e1f29 0x20 9 WC/0 > > 18446462598732840992 osd74 17.eea94d58 [74,73,25]/74 [74,73,25]/74 rbd_header.15d8670238e1f29 0x20 8 WC/0 > > Sat 8 Jul 18:50:01 BST 2017 > > REQUESTS 5 homeless 0 > > 174662831 osd25 17.77737285 [25,74,14]/25 [25,74,14]/25 rbd_data.15d8670238e1f29.00000000000cf9f8 0x400024 1 > > 0'0 set-alloc-hint,write > > 174662863 osd25 17.7b91a345 [25,74,14]/25 [25,74,14]/25 rbd_data.1555406238e1f29.000000000002571c 0x400024 1 > > 0'0 set-alloc-hint,write > > 174662887 osd25 17.6c2eaa93 [25,75,14]/25 [25,75,14]/25 rbd_data.158f204238e1f29.0000000000000008 0x400024 1 > > 0'0 set-alloc-hint,write > > 174662925 osd25 17.32271445 [25,74,14]/25 [25,74,14]/25 rbd_data.1555406238e1f29.0000000000000001 0x400024 1 > > 0'0 set-alloc-hint,write > > 174663129 osd25 17.32271445 [25,74,14]/25 [25,74,14]/25 rbd_data.1555406238e1f29.0000000000000001 0x400024 1 > > 0'0 set-alloc-hint,write > > LINGER REQUESTS > > 18446462598732840990 osd74 17.145baa0f [74,72,14]/74 [74,72,14]/74 rbd_header.158f204238e1f29 0x20 8 WC/0 > > 18446462598732840991 osd74 17.7b4e2a06 [74,72,25]/74 [74,72,25]/74 rbd_header.1555406238e1f29 0x20 9 WC/0 > > 18446462598732840992 osd74 17.eea94d58 [74,73,25]/74 [74,73,25]/74 rbd_header.15d8670238e1f29 0x20 8 WC/0 > > Sat 8 Jul 18:51:01 BST 2017 > > REQUESTS 5 homeless 0 > > 174662831 osd25 17.77737285 [25,74,14]/25 [25,74,14]/25 rbd_data.15d8670238e1f29.00000000000cf9f8 0x400024 1 > > 0'0 set-alloc-hint,write > > 174662863 osd25 17.7b91a345 [25,74,14]/25 [25,74,14]/25 rbd_data.1555406238e1f29.000000000002571c 0x400024 1 > > 0'0 set-alloc-hint,write > > 174662887 osd25 17.6c2eaa93 [25,75,14]/25 [25,75,14]/25 rbd_data.158f204238e1f29.0000000000000008 0x400024 1 > > 0'0 set-alloc-hint,write > > 174662925 osd25 17.32271445 [25,74,14]/25 [25,74,14]/25 rbd_data.1555406238e1f29.0000000000000001 0x400024 1 > > 0'0 set-alloc-hint,write > > 174663129 osd25 17.32271445 [25,74,14]/25 [25,74,14]/25 rbd_data.1555406238e1f29.0000000000000001 0x400024 1 > > 0'0 set-alloc-hint,write > > LINGER REQUESTS > > 18446462598732840990 osd74 17.145baa0f [74,72,14]/74 [74,72,14]/74 rbd_header.158f204238e1f29 0x20 8 WC/0 > > 18446462598732840991 osd74 17.7b4e2a06 [74,72,25]/74 [74,72,25]/74 rbd_header.1555406238e1f29 0x20 9 WC/0 > > 18446462598732840992 osd74 17.eea94d58 [74,73,25]/74 [74,73,25]/74 rbd_header.15d8670238e1f29 0x20 8 WC/0 > > Sat 8 Jul 18:52:01 BST 2017 > > REQUESTS 6 homeless 0 > > 174662831 osd25 17.77737285 [25,74,14]/25 [25,74,14]/25 rbd_data.15d8670238e1f29.00000000000cf9f8 0x400024 1 > > 0'0 set-alloc-hint,write > > 174662863 osd25 17.7b91a345 [25,74,14]/25 [25,74,14]/25 rbd_data.1555406238e1f29.000000000002571c 0x400024 1 > > 0'0 set-alloc-hint,write > > 174662887 osd25 17.6c2eaa93 [25,75,14]/25 [25,75,14]/25 rbd_data.158f204238e1f29.0000000000000008 0x400024 1 > > 0'0 set-alloc-hint,write > > 174662925 osd25 17.32271445 [25,74,14]/25 [25,74,14]/25 rbd_data.1555406238e1f29.0000000000000001 0x400024 1 > > 0'0 set-alloc-hint,write > > 174663129 osd25 17.32271445 [25,74,14]/25 [25,74,14]/25 rbd_data.1555406238e1f29.0000000000000001 0x400024 1 > > 0'0 set-alloc-hint,write > > 174664149 osd25 17.b148df13 [25,75,14]/25 [25,75,14]/25 rbd_data.158f204238e1f29.0000000000091205 0x400024 1 > > 0'0 set-alloc-hint,write > > LINGER REQUESTS > > 18446462598732840990 osd74 17.145baa0f [74,72,14]/74 [74,72,14]/74 rbd_header.158f204238e1f29 0x20 8 WC/0 > > 18446462598732840991 osd74 17.7b4e2a06 [74,72,25]/74 [74,72,25]/74 rbd_header.1555406238e1f29 0x20 9 WC/0 > > 18446462598732840992 osd74 17.eea94d58 [74,73,25]/74 [74,73,25]/74 rbd_header.15d8670238e1f29 0x20 8 WC/0 > > > > And continues on identically until 19:03 > > > > I realize at this stage, these reports are probably not revealing much > > more information, so I will report back if I can gather any further > > information from the OSD's. The problem does seem to be related to load or at least the number of RBD's mounted. The host that > only has 2 RBD's mounted hardly experiences this problem at all. > > > > Nick > > > Hi Ilya, > > I have managed today to capture the kernel logs with debugging turned on and the ms+osd debug logs from the mentioned OSD. > However, this is from a few minutes after the stall starts, not before. The very random nature of the stalls have made it difficult to > have debug logging on for extended periods of time. I hope there is something in here which might give a clue to what is happening, > otherwise I will continue to try and capture debugs from before the stall occurs. > > The kernel logs and OSD logs are attached as a zip. > > Sample of the hung requests > Wed 12 Jul 11:28:01 BST 2017 > REQUESTS 8 homeless 0 > 11457738 osd37 17.be732844 [37,74,14]/37 [37,74,14]/37 rbd_data.15d8670238e1f29.00000000001a2068 0x400024 1 > 0'0 set-alloc-hint,write > 11457759 osd37 17.9e3d4404 [37,74,14]/37 [37,74,14]/37 rbd_data.15d8670238e1f29.0000000000118149 0x400024 1 > 0'0 set-alloc-hint,write > 11457770 osd37 17.86ec0d14 [37,72,74]/37 [37,72,74]/37 rbd_data.15d8670238e1f29.000000000006665c 0x400024 1 > 0'0 set-alloc-hint,write > 11457818 osd37 17.e80ed1c0 [37,75,14]/37 [37,75,14]/37 rbd_data.158f204238e1f29.00000000000d7f1e 0x400014 1 > 0'0 read > 11457822 osd37 17.9db0a684 [37,74,14]/37 [37,74,14]/37 rbd_data.15d8670238e1f29.000000000000211c 0x400024 1 > 0'0 set-alloc-hint,write > 11457916 osd37 17.1848293b [37,3,73]/37 [37,3,73]/37 rbd_data.158f204238e1f29.00000000000d7f8e 0x400024 1 0'0 > set-alloc-hint,write > 11457967 osd37 17.56b0f4c0 [37,75,14]/37 [37,75,14]/37 rbd_data.158f204238e1f29.0000000000047ffc 0x400014 1 0'0 > read > 11457970 osd37 17.65ad6d40 [37,75,14]/37 [37,75,14]/37 rbd_data.15d8670238e1f29.0000000000060318 0x400024 1 > 0'0 set-alloc-hint,write > > Also a nice hung_task_timeout > [57154.424300] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [57154.424331] nfsd D 0 1650 2 0x00000000 > [57154.424334] Call Trace: > [57154.424341] __schedule+0x3c6/0x8c0 > [57154.424344] schedule+0x36/0x80 > [57154.424346] rwsem_down_write_failed+0x230/0x3a0 > [57154.424389] ? xfs_file_buffered_aio_write+0x68/0x270 [xfs] [57154.424392] call_rwsem_down_write_failed+0x17/0x30 > [57154.424394] ? call_rwsem_down_write_failed+0x17/0x30 > [57154.424396] down_write+0x2d/0x40 > [57154.424422] xfs_ilock+0xa7/0x110 [xfs] [57154.424446] xfs_file_buffered_aio_write+0x68/0x270 [xfs] [57154.424449] ? > iput+0x8a/0x230 [57154.424450] ? __check_object_size+0x100/0x19d [57154.424474] xfs_file_write_iter+0x103/0x150 [xfs] > [57154.424482] __do_readv_writev+0x2fb/0x390 [57154.424484] do_readv_writev+0x8d/0xe0 [57154.424487] ? > security_file_open+0x8a/0x90 [57154.424488] ? do_dentry_open+0x27a/0x310 [57154.424511] ? > xfs_extent_busy_ag_cmp+0x20/0x20 [xfs] [57154.424513] vfs_writev+0x3c/0x50 [57154.424514] ? vfs_writev+0x3c/0x50 > [57154.424525] nfsd_vfs_write+0xc6/0x3a0 [nfsd] [57154.424531] nfsd_write+0x144/0x200 [nfsd] [57154.424538] > nfsd3_proc_write+0xaa/0x140 [nfsd] [57154.424549] nfsd_dispatch+0xc8/0x260 [nfsd] [57154.424566] > svc_process_common+0x374/0x6b0 [sunrpc] [57154.424575] svc_process+0xfe/0x1b0 [sunrpc] [57154.424581] nfsd+0xe9/0x150 > [nfsd] [57154.424583] kthread+0x109/0x140 [57154.424588] ? nfsd_destroy+0x60/0x60 [nfsd] [57154.424590] ? > kthread_create_on_node+0x70/0x70 [57154.424592] ret_from_fork+0x2c/0x40 > > Regards, > Nick > > Just looking into this some more, I have found laggy requests to osd37 in the kernel logs: Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685559] libceph: req ffff9adad97a1450 tid 11457738 on osd37 is laggy Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685560] libceph: req ffff9adad0df9040 tid 11457759 on osd37 is laggy Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685561] libceph: req ffff9adad97a5d70 tid 11457770 on osd37 is laggy Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685562] libceph: req ffff9adad9036590 tid 11457818 on osd37 is laggy Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685563] libceph: req ffff9adad9035960 tid 11457822 on osd37 is laggy Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685564] libceph: req ffff9adad9036db0 tid 11457916 on osd37 is laggy Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685565] libceph: req ffff9adad90330c0 tid 11457967 on osd37 is laggy Jul 12 11:25:45 MS-CEPH-Proxy1 kernel: [66691.685565] libceph: req ffff9adad97a5550 tid 11457970 on osd37 is laggy And have found in the kernel client where it displays that message. https://github.com/ceph/ceph-client/blob/master/net/ceph/osd_client.c#L3073-L3098 Does that suggest for some reason the TCP connection is getting broken and causing the hang? For whatever reason it can't re-establish, or takes longer than it should? And thus causes the outage? Nick _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com