On Thu, Feb 28, 2013 at 09:00:24PM -0800, Sage Weil wrote: > On Fri, 1 Mar 2013, Chris Dunlop wrote: >> On Sat, Feb 23, 2013 at 01:02:53PM +1100, Chris Dunlop wrote: >>> On Fri, Feb 22, 2013 at 05:52:11PM -0800, Sage Weil wrote: >>>> On Sat, 23 Feb 2013, Chris Dunlop wrote: >>>>> On Fri, Feb 22, 2013 at 05:30:04PM -0800, Sage Weil wrote: >>>>>> On Sat, 23 Feb 2013, Chris Dunlop wrote: >>>>>>> On Fri, Feb 22, 2013 at 04:13:21PM -0800, Sage Weil wrote: >>>>>>>> On Sat, 23 Feb 2013, Chris Dunlop wrote: >>>>>>>>> On Fri, Feb 22, 2013 at 03:43:22PM -0800, Sage Weil wrote: >>>>>>>>>> On Sat, 23 Feb 2013, Chris Dunlop wrote: >>>>>>>>>>> On Fri, Feb 22, 2013 at 01:57:32PM -0800, Sage Weil wrote: >>>>>>>>>>>> I just looked at the logs. I can't tell what happend to cause that 10 >>>>>>>>>>>> second delay.. strangely, messages were passing from 0 -> 1, but nothing >>>>>>>>>>>> came back from 1 -> 0 (although 1 was queuing, if not sending, them). >>>>>>>>> >>>>>>>>> Is there any way of telling where they were delayed, i.e. in the 1's output >>>>>>>>> queue or 0's input queue? >>>>>>>> >>>>>>>> Yeah, if you bump it up to 'debug ms = 20'. Be aware that that will >>>>>>>> generate a lot of logging, though. >>>>>>> >>>>>>> I really don't want to load the system with too much logging, but I'm happy >>>>>>> modifying code... Are there specific interesting debug outputs which I can >>>>>>> modify so they're output under "ms = 1"? >>>>>> >>>>>> I'm basically interested in everything in writer() and write_message(), >>>>>> and reader() and read_message()... >>>>> >>>>> Like this? >>>> >>>> Yeah. You could do 2 instead of 1 so you can turn it down. I suspect >>>> that this is the lions share of what debug 20 will spam to the log, but >>>> hopefully the load is manageable! >>> >>> Good idea on the '2'. I'll get that installed and wait for it to happen again. >> >> FYI... >> >> To avoid running out of disk space for the massive logs, I >> started using logrotate on the ceph logs every two hours, which >> does a 'service ceph reload' to re-open the log files. >> >> In the week since doing that I haven't seen any 'slow requests' >> at all (the load has stayed the same as before the change), >> which means the issue with the osds dropping out, then the >> system not recovering properly, also hasn't happened. >> >> That's a bit suspicious, no? > > I suspect the logging itself is changing the timing. Let's wait and see > if we get lucky... We got "lucky"... ceph-mon.0.log: 2013-03-08 03:46:44.786682 7fcc62172700 1 -- 192.168.254.132:0/20298 --> 192.168.254.133:6801/23939 -- osd_ping(ping e815 stamp 2013-03-08 03:46:44.786679) v2 -- ?+0 0x765b180 con 0x6ab6160 [no ping_reply logged, then later...] 2013-03-08 03:46:56.211993 7fcc71190700 -1 osd.0 815 heartbeat_check: no reply from osd.1 since 2013-03-08 03:46:35.986327 (cutoff 2013-03-08 03:46:36.211992) ceph-mon.1.log: 2013-03-08 03:46:44.786848 7fe6f47a4700 1 -- 192.168.254.133:6801/23939 <== osd.0 192.168.254.132:0/20298 178549 ==== osd_ping(ping e815 stamp 2013-03-08 03:46:44.786679) v2 ==== 47+0+0 (1298645350 0 0) 0x98256c0 con 0x7bd2160 2013-03-08 03:46:44.786880 7fe6f47a4700 1 -- 192.168.254.133:6801/23939 --> 192.168.254.132:0/20298 -- osd_ping(ping_reply e815 stamp 2013-03-08 03:46:44.786679) v2 -- ?+0 0x29876c0 con 0x7bd2160 Interestingly, the matching ping_reply from osd.1 never appears in the osd.0 log, in contrast to the previous incident upthread where the "missing" ping replies were all seen in a rush (but after osd.1 had been marked down). The missing ping_reply caused osd.1 to get marked down, then it marked itself up again a bit later ("map e818 wrongly marked me down"). However the system still hadn't recovered before 07:46:29 when a 'service ceph restart' was done on the machine holding mon.b5 and osd.1, bringing things back to life. Before the restart: # ceph -s health HEALTH_WARN 273 pgs peering; 2 pgs recovery_wait; 273 pgs stuck inactive; 576 pgs stuck unclean; recovery 43/293224 degraded (0.015%) monmap e9: 3 mons at {b2=10.200.63.130:6789/0,b4=10.200.63.132:6789/0,b5=10.200.63.133:6789/0}, election epoch 898, quorum 0,1,2 b2,b4,b5 osdmap e825: 2 osds: 2 up, 2 in pgmap v3545580: 576 pgs: 301 active, 2 active+recovery_wait, 273 peering; 560 GB data, 1348 GB used, 2375 GB / 3724 GB avail; 43/293224 degraded (0.015%) mdsmap e1: 0/0/1 up After the restart: # ceph -s health HEALTH_WARN 19 pgs recovering; 24 pgs recovery_wait; 43 pgs stuck unclean; recovery 66/293226 degraded (0.023%) monmap e9: 3 mons at {b2=10.200.63.130:6789/0,b4=10.200.63.132:6789/0,b5=10.200.63.133:6789/0}, election epoch 902, quorum 0,1,2 b2,b4,b5 osdmap e828: 2 osds: 2 up, 2 in pgmap v3545603: 576 pgs: 533 active+clean, 24 active+recovery_wait, 19 active+recovering; 560 GB data, 1348 GB used, 2375 GB / 3724 GB avail; 0B/s rd, 8135KB/s wr, 224op/s; 66/293226 degraded (0.023%) mdsmap e1: 0/0/1 up Logs covering 00:00 to 09:00 are in: https://www.dropbox.com/sh/7nq7gr2u2deorcu/Nvw3FFGiy2 Cheers, Chris -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html