Re: Mon losing touch with OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



G'day,

It seems there might be two issues here: the first being the delayed
receipt of echo replies causing an seemingly otherwise healthy osd to be
marked down, the second being the lack of recovery once the downed osd is
recognised as up again.

Is it worth my opening tracker reports for this, just so it doesn't get
lost?

Cheers,

Chris

On Wed, Feb 20, 2013 at 01:07:03PM +1100, Chris Dunlop wrote:
> On Tue, Feb 19, 2013 at 02:02:03PM +1100, Chris Dunlop wrote:
>> On Sun, Feb 17, 2013 at 05:44:29PM -0800, Sage Weil wrote:
>>> On Mon, 18 Feb 2013, Chris Dunlop wrote:
>>>> On Sat, Feb 16, 2013 at 09:05:21AM +1100, Chris Dunlop wrote:
>>>>> On Thu, Feb 14, 2013 at 08:57:11PM -0800, Sage Weil wrote:
>>>>>> On Fri, 15 Feb 2013, Chris Dunlop wrote:
>>>>>>> In an otherwise seemingly healthy cluster (ceph 0.56.2), what might cause the
>>>>>>> mons to lose touch with the osds?
>>>>>> 
>>>>>> Can you enable 'debug ms = 1' on the mons and leave them that way, in the 
>>>>>> hopes that this happens again?  It will give us more information to go on.
>>>>> 
>>>>> Debug turned on.
>>>> 
>>>> We haven't experienced the cluster losing touch with the osds completely
>>>> since upgrading from 0.56.2 to 0.56.3, but we did lose touch with osd.1
>>>> for a few seconds before it recovered. See below for logs (reminder: 3
>>>> boxes, b2 is mon-only, b4 is mon+osd.0, b5 is mon+osd.1).
>>> 
>>> Hrm, I don't see any obvious clues.  You could enable 'debug ms = 1' on 
>>> the osds as well.  That will give us more to go on if/when it happens 
>>> again, and should not affect performance significantly.
>> 
>> Done: ceph osd tell '*' injectargs '--debug-ms 1'
>> 
>> Now to wait for it to happen again.
> 
> OK, we got it again. Full logs covering the incident available at:
> 
> https://www.dropbox.com/s/kguzwyjfglv3ypl/ceph-logs.zip
> 
> Archive:  /tmp/ceph-logs.zip
>  Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
> --------  ------  ------- ---- ---------- ----- --------  ----
>    11492  Defl:X     1186  90% 2013-02-20 12:04 c0cba4ae  ceph-mon.b2.log
>  1270789  Defl:X    89278  93% 2013-02-20 12:00 2208d035  ceph-mon.b4.log
>  1375858  Defl:X   104025  92% 2013-02-20 12:05 c64c1dad  ceph-mon.b5.log
>  2020042  Defl:X   215000  89% 2013-02-20 10:40 f74ae4ca  ceph-osd.0.log
>  2075512  Defl:X   224098  89% 2013-02-20 12:05 b454d2ec  ceph-osd.1.log
>   154938  Defl:X    12989  92% 2013-02-20 12:04 d2729b05  ceph.log
> --------          -------  ---                            -------
>  6908631           646576  91%                            6 files
> 
> My naive analysis, based on the log extracts below (best viewed on a wide
> screen!)...
> 
> Osd.0 starts hearing much-delayed ping_replies from osd.1 and tells the mon,
> which marks osd.1 down.
> 
> However the whole time, the osd.1 log indicates that it's receiving and
> responding to each ping from osd.0 in a timely fashion. In contrast, the osd.0
> log indicates it isn't seeing the osd.1 replies for a while, then sees them all
> arrive in a flurry, until they're "delayed" enough to cause osd.0 to tell the
> mon.
> 
> During the time osd.0 is not seeing the osd.1 ping_replies, there's other traffic
> (osd_op, osd_sub_op, osd_sub_op_reply etc.) between osd.0 and osd.1, indicating
> that it's not a network problem.
> 
> The load on both osds during this period was >90% idle and <1% iow.
> 
> Is this pointing to osd.0 experiencing some kind of scheduling or priority
> starvation on the ping thread (assuming the ping is in it's own thread)?
> 
> The next odd thing is that, although the osds are both back by 04:38:50 ("2
> osds: 2 up, 2 in"), the system still wasn't working (see the disk stats for
> both osd.0 and osd.1) and didn't recover until ceph (mon + osd) was restarted
> on one of the boxes at around 05:50 (not shown in the logs, but full logs
> available if needed).
> 
> Prior to the restart:
> 
> # ceph health
> HEALTH_WARN 281 pgs peering; 281 pgs stuck inactive; 576 pgs stuck unclean
> 
> (Sorry, once again didn't get a 'ceph -s' prior to the restart.)
> 
> Chris.
> 
> ----------------------------------------------------------------------
> ceph.log
> ----------------------------------------------------------------------
> 2013-02-20 04:37:51.074128 mon.0 10.200.63.130:6789/0 120771 : [INF] pgmap v3000932: 576 pgs: 575 active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:37:53.541471 mon.0 10.200.63.130:6789/0 120772 : [INF] pgmap v3000933: 576 pgs: 575 active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:37:56.063059 mon.0 10.200.63.130:6789/0 120773 : [INF] pgmap v3000934: 576 pgs: 575 active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:37:58.532763 mon.0 10.200.63.130:6789/0 120774 : [INF] pgmap v3000935: 576 pgs: 575 active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> <<<< osd.0 sees delayed ping_replies from here >>>>
> 2013-02-20 04:38:01.057939 mon.0 10.200.63.130:6789/0 120775 : [INF] pgmap v3000936: 576 pgs: 575 active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:03.541404 mon.0 10.200.63.130:6789/0 120776 : [INF] pgmap v3000937: 576 pgs: 575 active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:06.133004 mon.0 10.200.63.130:6789/0 120777 : [INF] pgmap v3000938: 576 pgs: 575 active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:08.540471 mon.0 10.200.63.130:6789/0 120778 : [INF] pgmap v3000939: 576 pgs: 575 active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:11.064003 mon.0 10.200.63.130:6789/0 120779 : [INF] pgmap v3000940: 576 pgs: 575 active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:13.547845 mon.0 10.200.63.130:6789/0 120780 : [INF] pgmap v3000941: 576 pgs: 575 active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:16.062892 mon.0 10.200.63.130:6789/0 120781 : [INF] pgmap v3000942: 576 pgs: 575 active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:18.530804 mon.0 10.200.63.130:6789/0 120782 : [INF] pgmap v3000943: 576 pgs: 575 active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:21.080347 mon.0 10.200.63.130:6789/0 120783 : [INF] pgmap v3000944: 576 pgs: 575 active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:23.555523 mon.0 10.200.63.130:6789/0 120784 : [INF] pgmap v3000945: 576 pgs: 575 active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:26.071449 mon.0 10.200.63.130:6789/0 120785 : [INF] pgmap v3000946: 576 pgs: 575 active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:28.561133 mon.0 10.200.63.130:6789/0 120786 : [INF] pgmap v3000947: 576 pgs: 575 active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:31.068101 mon.0 10.200.63.130:6789/0 120787 : [INF] pgmap v3000948: 576 pgs: 575 active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:33.536022 mon.0 10.200.63.130:6789/0 120788 : [INF] pgmap v3000949: 576 pgs: 575 active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:36.081591 mon.0 10.200.63.130:6789/0 120789 : [INF] pgmap v3000950: 576 pgs: 575 active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:38.380909 mon.0 10.200.63.130:6789/0 120790 : [DBG] osd.1 10.200.63.133:6801/21178 reported failed by osd.0 10.200.63.132:6801/18444
> 2013-02-20 04:38:43.372798 mon.0 10.200.63.130:6789/0 120793 : [DBG] osd.1 10.200.63.133:6801/21178 reported failed by osd.0 10.200.63.132:6801/18444
> 2013-02-20 04:38:48.373930 mon.0 10.200.63.130:6789/0 120796 : [DBG] osd.1 10.200.63.133:6801/21178 reported failed by osd.0 10.200.63.132:6801/18444
> 2013-02-20 04:38:48.373990 mon.0 10.200.63.130:6789/0 120797 : [INF] osd.1 10.200.63.133:6801/21178 failed (3 reports from 1 peers after 2013-02-20 04:39:11.373918 >= grace 20.000000)
> 2013-02-20 04:38:48.565717 mon.0 10.200.63.130:6789/0 120798 : [INF] osdmap e791: 2 osds: 1 up, 2 in
> 2013-02-20 04:38:48.670726 mon.0 10.200.63.130:6789/0 120799 : [INF] pgmap v3000955: 576 pgs: 575 active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:49.073328 mon.0 10.200.63.130:6789/0 120800 : [INF] pgmap v3000956: 576 pgs: 271 active+clean, 304 stale+active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:49.654554 mon.0 10.200.63.130:6789/0 120801 : [INF] osdmap e792: 2 osds: 1 up, 2 in
> 2013-02-20 04:38:49.857067 mon.0 10.200.63.130:6789/0 120802 : [INF] pgmap v3000957: 576 pgs: 271 active+clean, 304 stale+active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:50.749644 mon.0 10.200.63.130:6789/0 120803 : [INF] osdmap e793: 2 osds: 2 up, 2 in
> 2013-02-20 04:38:50.749710 mon.0 10.200.63.130:6789/0 120804 : [INF] osd.1 10.200.63.133:6801/21178 boot
> 2013-02-20 04:38:50.850887 mon.0 10.200.63.130:6789/0 120805 : [INF] pgmap v3000958: 576 pgs: 271 active+clean, 304 stale+active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:51.834189 mon.0 10.200.63.130:6789/0 120806 : [INF] osdmap e794: 2 osds: 2 up, 2 in
> 2013-02-20 04:38:51.956560 mon.0 10.200.63.130:6789/0 120807 : [INF] pgmap v3000959: 576 pgs: 271 active+clean, 304 stale+active+clean, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:56.162743 mon.0 10.200.63.130:6789/0 120808 : [INF] pgmap v3000960: 576 pgs: 295 active, 271 active+clean, 9 peering, 1 active+clean+scrubbing; 410 GB data, 842 GB used, 2881 GB / 3724 GB avail
> 2013-02-20 04:38:57.235082 mon.0 10.200.63.130:6789/0 120809 : [INF] pgmap v3000961: 576 pgs: 295 active, 281 peering; 410 GB data, 841 GB used, 2882 GB / 3724 GB avail
> 2013-02-20 04:38:48.660979 osd.1 10.200.63.133:6801/21178 997 : [WRN] map e791 wrongly marked me down
> 2013-02-20 04:39:01.158928 mon.0 10.200.63.130:6789/0 120810 : [INF] pgmap v3000962: 576 pgs: 295 active, 281 peering; 410 GB data, 841 GB used, 2882 GB / 3724 GB avail
> 2013-02-20 04:39:19.111723 osd.0 10.200.63.132:6801/18444 800 : [WRN] 6 slow requests, 6 included below; oldest blocked for > 30.770732 secs
> 2013-02-20 04:39:19.111729 osd.0 10.200.63.132:6801/18444 801 : [WRN] slow request 30.770732 seconds old, received at 2013-02-20 04:38:48.340908: osd_op(client.9971.0:685981 rb.0.1c62.2ae8944a.0000000003aa [write 3878912~4096] 2.c82ee285) v4 currently reached pg
> 2013-02-20 04:39:19.111735 osd.0 10.200.63.132:6801/18444 802 : [WRN] slow request 30.770225 seconds old, received at 2013-02-20 04:38:48.341415: osd_op(client.9971.0:685984 rb.0.1c62.2ae8944a.000000000439 [write 364544~20480] 2.b16f5ace) v4 currently reached pg
> 2013-02-20 04:39:19.111738 osd.0 10.200.63.132:6801/18444 803 : [WRN] slow request 30.456112 seconds old, received at 2013-02-20 04:38:48.655528: osd_op(client.9986.0:178417 broot.rbd [watch 1~0] 2.d30a2f40) v4 currently reached pg
> 2013-02-20 04:39:19.111743 osd.0 10.200.63.132:6801/18444 804 : [WRN] slow request 30.456106 seconds old, received at 2013-02-20 04:38:48.655534: osd_op(client.9989.0:215170 broot-nfs2.rbd [watch 1~0] 2.7802d31e) v4 currently reached pg
> 2013-02-20 04:39:19.111747 osd.0 10.200.63.132:6801/18444 805 : [WRN] slow request 30.455860 seconds old, received at 2013-02-20 04:38:48.655780: osd_op(client.9968.0:302450 dns1.rbd [watch 1~0] 2.383712c1) v4 currently reached pg
> 
> ----------------------------------------------------------------------
> grep osd_ping ceph-osd.0.log
> ----------------------------------------------------------------------
> 2013-02-20 04:37:57.347387 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:37:57.347384) v2 -- ?+0 0xbd248c0 con 0xa2dcdc0
> 2013-02-20 04:37:57.349406 7f1090f23700  1 -- 192.168.254.132:0/18444 <== osd.1 192.168.254.133:6803/21178 79153 ==== osd_ping(ping_reply e790 stamp 2013-02-20 04:37:57.347384) v2 ==== 47+0+0 (2837779695 0 0) 0xbc28a80 con 0xa2dcdc0
> 2013-02-20 04:37:57.847588 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:37:57.847586) v2 -- ?+0 0xa1ea540 con 0xa2dcdc0
> 2013-02-20 04:37:58.050400 7f1090f23700  1 -- 192.168.254.132:0/18444 <== osd.1 192.168.254.133:6803/21178 79154 ==== osd_ping(ping_reply e790 stamp 2013-02-20 04:37:57.847586) v2 ==== 47+0+0 (3920125339 0 0) 0xdb34000 con 0xa2dcdc0
> <<<< osd.0 sees delayed ping_replies from here >>>>
> 2013-02-20 04:37:59.547719 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:37:59.547716) v2 -- ?+0 0x73e5340 con 0xa2dcdc0
> 2013-02-20 04:38:00.047911 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:38:00.047909) v2 -- ?+0 0x99bbc00 con 0xa2dcdc0
> 2013-02-20 04:38:01.748080 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:38:01.748077) v2 -- ?+0 0xa1eb6c0 con 0xa2dcdc0
> 2013-02-20 04:38:03.448223 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:38:03.448220) v2 -- ?+0 0x99bb180 con 0xa2dcdc0
> 2013-02-20 04:38:03.948413 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:38:03.948411) v2 -- ?+0 0x99ba700 con 0xa2dcdc0
> 2013-02-20 04:38:04.448601 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:38:04.448598) v2 -- ?+0 0x99ba540 con 0xa2dcdc0
> 2013-02-20 04:38:04.948724 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:38:04.948720) v2 -- ?+0 0xa1ebdc0 con 0xa2dcdc0
> 2013-02-20 04:38:08.448860 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:38:08.448856) v2 -- ?+0 0xb676380 con 0xa2dcdc0
> 2013-02-20 04:38:08.949028 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:38:08.949025) v2 -- ?+0 0xbc28380 con 0xa2dcdc0
> 2013-02-20 04:38:10.649263 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:38:10.649260) v2 -- ?+0 0x965d340 con 0xa2dcdc0
> 2013-02-20 04:38:11.749458 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:38:11.749455) v2 -- ?+0 0xa1eafc0 con 0xa2dcdc0
> 2013-02-20 04:38:12.799154 7f1090f23700  1 -- 192.168.254.132:0/18444 <== osd.1 192.168.254.133:6803/21178 79155 ==== osd_ping(ping_reply e790 stamp 2013-02-20 04:37:59.547716) v2 ==== 47+0+0 (1242454262 0 0) 0xbc29880 con 0xa2dcdc0
> 2013-02-20 04:38:12.799459 7f1090f23700  1 -- 192.168.254.132:0/18444 <== osd.1 192.168.254.133:6803/21178 79156 ==== osd_ping(ping_reply e790 stamp 2013-02-20 04:38:00.047909) v2 ==== 47+0+0 (3852750933 0 0) 0xcaff180 con 0xa2dcdc0
> 2013-02-20 04:38:12.799496 7f1090f23700  1 -- 192.168.254.132:0/18444 <== osd.1 192.168.254.133:6803/21178 79157 ==== osd_ping(ping_reply e790 stamp 2013-02-20 04:38:01.748077) v2 ==== 47+0+0 (3672189647 0 0) 0xb677340 con 0xa2dcdc0
> 2013-02-20 04:38:12.799542 7f1090f23700  1 -- 192.168.254.132:0/18444 <== osd.1 192.168.254.133:6803/21178 79158 ==== osd_ping(ping_reply e790 stamp 2013-02-20 04:38:03.448220) v2 ==== 47+0+0 (38366945 0 0) 0xbc28c40 con 0xa2dcdc0
> 2013-02-20 04:38:12.799554 7f1090f23700  1 -- 192.168.254.132:0/18444 <== osd.1 192.168.254.133:6803/21178 79159 ==== osd_ping(ping_reply e790 stamp 2013-02-20 04:38:03.948411) v2 ==== 47+0+0 (83904766 0 0) 0x884ee00 con 0xa2dcdc0
> 2013-02-20 04:38:12.799573 7f1090f23700  1 -- 192.168.254.132:0/18444 <== osd.1 192.168.254.133:6803/21178 79160 ==== osd_ping(ping_reply e790 stamp 2013-02-20 04:38:04.448598) v2 ==== 47+0+0 (2688468082 0 0) 0x10c5c1c0 con 0xa2dcdc0
> 2013-02-20 04:38:12.799667 7f1090f23700  1 -- 192.168.254.132:0/18444 <== osd.1 192.168.254.133:6803/21178 79161 ==== osd_ping(ping_reply e790 stamp 2013-02-20 04:38:04.948720) v2 ==== 47+0+0 (4187258751 0 0) 0xb21a540 con 0xa2dcdc0
> 2013-02-20 04:38:12.799689 7f1090f23700  1 -- 192.168.254.132:0/18444 <== osd.1 192.168.254.133:6803/21178 79162 ==== osd_ping(ping_reply e790 stamp 2013-02-20 04:38:08.448856) v2 ==== 47+0+0 (4176431512 0 0) 0xb21b180 con 0xa2dcdc0
> 2013-02-20 04:38:12.799710 7f1090f23700  1 -- 192.168.254.132:0/18444 <== osd.1 192.168.254.133:6803/21178 79163 ==== osd_ping(ping_reply e790 stamp 2013-02-20 04:38:08.949025) v2 ==== 47+0+0 (2888471344 0 0) 0xb21b340 con 0xa2dcdc0
> 2013-02-20 04:38:12.799728 7f1090f23700  1 -- 192.168.254.132:0/18444 <== osd.1 192.168.254.133:6803/21178 79164 ==== osd_ping(ping_reply e790 stamp 2013-02-20 04:38:10.649260) v2 ==== 47+0+0 (3060931781 0 0) 0xb21aa80 con 0xa2dcdc0
> 2013-02-20 04:38:12.799745 7f1090f23700  1 -- 192.168.254.132:0/18444 <== osd.1 192.168.254.133:6803/21178 79165 ==== osd_ping(ping_reply e790 stamp 2013-02-20 04:38:11.749455) v2 ==== 47+0+0 (2767620502 0 0) 0x8d4e380 con 0xa2dcdc0
> 2013-02-20 04:38:14.049649 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:38:14.049645) v2 -- ?+0 0xa1ea000 con 0xa2dcdc0
> 2013-02-20 04:38:14.260608 7f1090f23700  1 -- 192.168.254.132:0/18444 <== osd.1 192.168.254.133:6803/21178 79166 ==== osd_ping(ping_reply e790 stamp 2013-02-20 04:38:14.049645) v2 ==== 47+0+0 (462572634 0 0) 0xbc29a40 con 0xa2dcdc0
> 2013-02-20 04:38:15.149828 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:38:15.149826) v2 -- ?+0 0xac85340 con 0xa2dcdc0
> 2013-02-20 04:38:15.151892 7f1090f23700  1 -- 192.168.254.132:0/18444 <== osd.1 192.168.254.133:6803/21178 79167 ==== osd_ping(ping_reply e790 stamp 2013-02-20 04:38:15.149826) v2 ==== 47+0+0 (2092320694 0 0) 0xdb34380 con 0xa2dcdc0
> 2013-02-20 04:38:21.050059 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:38:21.050056) v2 -- ?+0 0xb677c00 con 0xa2dcdc0
> 2013-02-20 04:38:25.750198 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:38:25.750195) v2 -- ?+0 0xbd24000 con 0xa2dcdc0
> 2013-02-20 04:38:28.650370 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:38:28.650367) v2 -- ?+0 0x7e94a80 con 0xa2dcdc0
> 2013-02-20 04:38:32.150553 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:38:32.150550) v2 -- ?+0 0xb677500 con 0xa2dcdc0
> 2013-02-20 04:38:34.450740 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:38:34.450737) v2 -- ?+0 0x9bc7500 con 0xa2dcdc0
> 2013-02-20 04:38:35.369720 7f109af37700 -1 osd.0 790 heartbeat_check: no reply from osd.1 since 2013-02-20 04:38:15.149826 (cutoff 2013-02-20 04:38:15.369719)
> 2013-02-20 04:38:36.369895 7f109af37700 -1 osd.0 790 heartbeat_check: no reply from osd.1 since 2013-02-20 04:38:15.149826 (cutoff 2013-02-20 04:38:16.369894)
> 
> ----------------------------------------------------------------------
> grep osd_ping ceph-osd.1.log
> ----------------------------------------------------------------------
> 2013-02-20 04:37:57.847878 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79154 ==== osd_ping(ping e790 stamp 2013-02-20 04:37:57.847586) v2 ==== 47+0+0 (2625351075 0 0) 0xb441880 con 0xb9a89a0
> 2013-02-20 04:37:57.847957 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:37:57.847586) v2 -- ?+0 0xe5921c0 con 0xb9a89a0
> <<<< osd.0 sees delayed ping_replies from here >>>>
> 2013-02-20 04:37:59.547994 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79155 ==== osd_ping(ping e790 stamp 2013-02-20 04:37:59.547716) v2 ==== 47+0+0 (1071491278 0 0) 0xb440700 con 0xb9a89a0
> 2013-02-20 04:37:59.548066 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:37:59.547716) v2 -- ?+0 0xb441880 con 0xb9a89a0
> 2013-02-20 04:38:00.048174 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79156 ==== osd_ping(ping e790 stamp 2013-02-20 04:38:00.047909) v2 ==== 47+0+0 (2423758957 0 0) 0xc987a40 con 0xb9a89a0
> 2013-02-20 04:38:00.048262 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:38:00.047909) v2 -- ?+0 0xb440700 con 0xb9a89a0
> 2013-02-20 04:38:01.748248 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79157 ==== osd_ping(ping e790 stamp 2013-02-20 04:38:01.748077) v2 ==== 47+0+0 (2939345655 0 0) 0xb6d96c0 con 0xb9a89a0
> 2013-02-20 04:38:01.748330 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:38:01.748077) v2 -- ?+0 0xc987a40 con 0xb9a89a0
> 2013-02-20 04:38:03.448435 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79158 ==== osd_ping(ping e790 stamp 2013-02-20 04:38:03.448220) v2 ==== 47+0+0 (2006621913 0 0) 0xb71c540 con 0xb9a89a0
> 2013-02-20 04:38:03.448531 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:38:03.448220) v2 -- ?+0 0xb6d96c0 con 0xb9a89a0
> 2013-02-20 04:38:04.163566 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79159 ==== osd_ping(ping e790 stamp 2013-02-20 04:38:03.948411) v2 ==== 47+0+0 (1892923590 0 0) 0xc02b6c0 con 0xb9a89a0
> 2013-02-20 04:38:04.163648 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:38:03.948411) v2 -- ?+0 0xb71c540 con 0xb9a89a0
> 2013-02-20 04:38:04.448837 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79160 ==== osd_ping(ping e790 stamp 2013-02-20 04:38:04.448598) v2 ==== 47+0+0 (3589092426 0 0) 0xc02a8c0 con 0xb9a89a0
> 2013-02-20 04:38:04.448876 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:38:04.448598) v2 -- ?+0 0xc02b6c0 con 0xb9a89a0
> 2013-02-20 04:38:04.949019 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79161 ==== osd_ping(ping e790 stamp 2013-02-20 04:38:04.948720) v2 ==== 47+0+0 (2353499975 0 0) 0x6fae700 con 0xb9a89a0
> 2013-02-20 04:38:04.949106 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:38:04.948720) v2 -- ?+0 0xc02a8c0 con 0xb9a89a0
> 2013-02-20 04:38:08.449126 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79162 ==== osd_ping(ping e790 stamp 2013-02-20 04:38:08.448856) v2 ==== 47+0+0 (2369567136 0 0) 0xc02ac40 con 0xb9a89a0
> 2013-02-20 04:38:08.449210 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:38:08.448856) v2 -- ?+0 0x6fae700 con 0xb9a89a0
> 2013-02-20 04:38:08.949215 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79163 ==== osd_ping(ping e790 stamp 2013-02-20 04:38:08.949025) v2 ==== 47+0+0 (3656999688 0 0) 0xc02ba40 con 0xb9a89a0
> 2013-02-20 04:38:08.949277 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:38:08.949025) v2 -- ?+0 0xc02ac40 con 0xb9a89a0
> 2013-02-20 04:38:10.649580 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79164 ==== osd_ping(ping e790 stamp 2013-02-20 04:38:10.649260) v2 ==== 47+0+0 (3282169085 0 0) 0xc02a000 con 0xb9a89a0
> 2013-02-20 04:38:10.649647 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:38:10.649260) v2 -- ?+0 0xc02ba40 con 0xb9a89a0
> 2013-02-20 04:38:11.749750 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79165 ==== osd_ping(ping e790 stamp 2013-02-20 04:38:11.749455) v2 ==== 47+0+0 (3508894126 0 0) 0xe593180 con 0xb9a89a0
> 2013-02-20 04:38:11.749835 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:38:11.749455) v2 -- ?+0 0xc02a000 con 0xb9a89a0
> 2013-02-20 04:38:14.049868 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79166 ==== osd_ping(ping e790 stamp 2013-02-20 04:38:14.049645) v2 ==== 47+0+0 (1849801826 0 0) 0xc02bdc0 con 0xb9a89a0
> 2013-02-20 04:38:14.049943 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:38:14.049645) v2 -- ?+0 0xe593180 con 0xb9a89a0
> 2013-02-20 04:38:15.150155 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79167 ==== osd_ping(ping e790 stamp 2013-02-20 04:38:15.149826) v2 ==== 47+0+0 (157661070 0 0) 0xb6d9340 con 0xb9a89a0
> 2013-02-20 04:38:15.150242 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:38:15.149826) v2 -- ?+0 0xc02bdc0 con 0xb9a89a0
> 2013-02-20 04:38:21.050348 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79168 ==== osd_ping(ping e790 stamp 2013-02-20 04:38:21.050056) v2 ==== 47+0+0 (2307149705 0 0) 0xbf4bc00 con 0xb9a89a0
> 2013-02-20 04:38:21.050433 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:38:21.050056) v2 -- ?+0 0xb6d9340 con 0xb9a89a0
> 2013-02-20 04:38:25.750415 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79169 ==== osd_ping(ping e790 stamp 2013-02-20 04:38:25.750195) v2 ==== 47+0+0 (3873827452 0 0) 0xe592000 con 0xb9a89a0
> 2013-02-20 04:38:25.750548 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:38:25.750195) v2 -- ?+0 0xbf4bc00 con 0xb9a89a0
> 2013-02-20 04:38:28.650634 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79170 ==== osd_ping(ping e790 stamp 2013-02-20 04:38:28.650367) v2 ==== 47+0+0 (718534970 0 0) 0x78ff880 con 0xb9a89a0
> 2013-02-20 04:38:28.650713 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:38:28.650367) v2 -- ?+0 0xe592000 con 0xb9a89a0
> 2013-02-20 04:38:32.150855 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79171 ==== osd_ping(ping e790 stamp 2013-02-20 04:38:32.150550) v2 ==== 47+0+0 (1909552382 0 0) 0x6fae000 con 0xb9a89a0
> 2013-02-20 04:38:32.150939 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:38:32.150550) v2 -- ?+0 0x78ff880 con 0xb9a89a0
> 2013-02-20 04:38:34.450994 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79172 ==== osd_ping(ping e790 stamp 2013-02-20 04:38:34.450737) v2 ==== 47+0+0 (886776134 0 0) 0xd304700 con 0xb9a89a0
> 2013-02-20 04:38:34.451033 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:38:34.450737) v2 -- ?+0 0x6fae000 con 0xb9a89a0
> 2013-02-20 04:38:37.351175 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79173 ==== osd_ping(ping e790 stamp 2013-02-20 04:38:37.350908) v2 ==== 47+0+0 (2444468724 0 0) 0xe593340 con 0xb9a89a0
> 2013-02-20 04:38:37.351215 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:38:37.350908) v2 -- ?+0 0xd304700 con 0xb9a89a0
> 2013-02-20 04:38:41.451417 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79174 ==== osd_ping(ping e790 stamp 2013-02-20 04:38:41.451094) v2 ==== 47+0+0 (4159941099 0 0) 0xb6d8380 con 0xb9a89a0
> 2013-02-20 04:38:41.451477 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:38:41.451094) v2 -- ?+0 0xe593340 con 0xb9a89a0
> 2013-02-20 04:38:43.751635 7f564b733700  1 -- 192.168.254.133:6803/21178 <== osd.0 192.168.254.132:0/18444 79175 ==== osd_ping(ping e790 stamp 2013-02-20 04:38:43.751289) v2 ==== 47+0+0 (800627449 0 0) 0xbb0b180 con 0xb9a89a0
> 2013-02-20 04:38:43.751675 7f564b733700  1 -- 192.168.254.133:6803/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e790 stamp 2013-02-20 04:38:43.751289) v2 -- ?+0 0xb6d8380 con 0xb9a89a0
> 2013-02-20 04:38:53.151872 7f564b733700  1 -- 192.168.254.133:6802/21178 <== osd.0 192.168.254.132:0/18444 1 ==== osd_ping(ping e794 stamp 2013-02-20 04:38:53.151589) v2 ==== 47+0+0 (3010557341 0 0) 0x78ff880 con 0x10d39340
> 2013-02-20 04:38:53.151908 7f564b733700  1 -- 192.168.254.133:6802/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e794 stamp 2013-02-20 04:38:53.151589) v2 -- ?+0 0xbb0b180 con 0x10d39340
> 2013-02-20 04:38:56.052292 7f564b733700  1 -- 192.168.254.133:6802/21178 <== osd.0 192.168.254.132:0/18444 2 ==== osd_ping(ping e794 stamp 2013-02-20 04:38:56.051788) v2 ==== 47+0+0 (4254185894 0 0) 0xb6d9340 con 0x10d39340
> 2013-02-20 04:38:56.052330 7f564b733700  1 -- 192.168.254.133:6802/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e794 stamp 2013-02-20 04:38:56.051788) v2 -- ?+0 0x78ff880 con 0x10d39340
> 2013-02-20 04:38:57.152438 7f564b733700  1 -- 192.168.254.133:6802/21178 <== osd.0 192.168.254.132:0/18444 3 ==== osd_ping(ping e794 stamp 2013-02-20 04:38:57.152006) v2 ==== 47+0+0 (1503720898 0 0) 0xb6d96c0 con 0x10d39340
> 2013-02-20 04:38:57.152472 7f564b733700  1 -- 192.168.254.133:6802/21178 --> 192.168.254.132:0/18444 -- osd_ping(ping_reply e794 stamp 2013-02-20 04:38:57.152006) v2 -- ?+0 0xb6d9340 con 0x10d39340
> 
> ----------------------------------------------------------------------
> ceph-osd.0.log, showing osd.0 <=> osd.1 activity whilst osd.1 ping_replies aren't being seen
> ----------------------------------------------------------------------
> <<<< osd.0 sees delayed ping_replies from here >>>>
> 2013-02-20 04:37:59.547719 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:37:59.547716) v2 -- ?+0 0x73e5340 con 0xa2dcdc0
> 2013-02-20 04:38:00.047911 7f108bf19700  1 -- 192.168.254.132:0/18444 --> 192.168.254.133:6803/21178 -- osd_ping(ping e790 stamp 2013-02-20 04:38:00.047909) v2 -- ?+0 0x99bbc00 con 0xa2dcdc0
> 2013-02-20 04:38:00.342856 7f108e71e700  1 -- 192.168.254.132:6802/18444 --> osd.1 192.168.254.133:6801/21178 -- osd_sub_op(client.9811.0:498225 2.51 4d55cd51/rb.0.2150.2ae8944a.000000000628/head//2 [] v 790'271511 snapset=0=[]:[] snapc=0=[]) v7 -- ?+8799 0x8f91400
> 2013-02-20 04:38:01.046192 7f1091f25700  1 -- 192.168.254.132:6802/18444 <== osd.1 192.168.254.133:6801/21178 4989786 ==== osd_sub_op(unknown.0.0:0 2.4e 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v7 ==== 955+0+21617430 (245718815 0 180673924) 0xd19d200 con 0x9571080
> 2013-02-20 04:38:01.046297 7f1091f25700  1 -- 192.168.254.132:6802/18444 <== osd.1 192.168.254.133:6801/21178 4989787 ==== osd_sub_op_reply(client.9971.0:685924 2.4e b16f5ace/rb.0.1c62.2ae8944a.000000000439/head//2 [] ondisk, result = 0) v1 ==== 157+0+0 (3427135234 0 0) 0xa9c4f00 con 0x9571080
> 2013-02-20 04:38:01.046352 7f1091f25700  1 -- 192.168.254.132:6802/18444 <== osd.1 192.168.254.133:6801/21178 4989788 ==== osd_sub_op_reply(client.9953.0:367558 2.5a 6d3052da/rbd_data.242f2ae8944a.000000000000002c/head//2 [] ondisk, result = 0) v1 ==== 164+0+0 (2641157471 0 0) 0x86f2280 con 0x9571080
> 2013-02-20 04:38:01.046390 7f1091f25700  1 -- 192.168.254.132:6802/18444 <== osd.1 192.168.254.133:6801/21178 4989789 ==== osd_sub_op_reply(client.9953.0:367559 2.28 8340fa28/rbd_data.242f2ae8944a.0000000000000096/head//2 [] ondisk, result = 0) v1 ==== 164+0+0 (2759328511 0 0) 0x95d2780 con 0x9571080
> 2013-02-20 04:38:01.046743 7f1091f25700  1 -- 192.168.254.132:6802/18444 <== osd.1 192.168.254.133:6801/21178 4989790 ==== osd_sub_op_reply(client.9953.0:367560 2.5a 6678d7da/rbd_data.242f2ae8944a.0000000000000b8b/head//2 [] ondisk, result = 0) v1 ==== 164+0+0 (602117107 0 0) 0xda1fb80 con 0x9571080
> 2013-02-20 04:38:01.047020 7f1091f25700  1 -- 192.168.254.132:6802/18444 <== osd.1 192.168.254.133:6801/21178 4989791 ==== osd_sub_op_reply(client.9811.0:498225 2.51 4d55cd51/rb.0.2150.2ae8944a.000000000628/head//2 [] ondisk, result = 0) v1 ==== 157+0+0 (435514219 0 0) 0xac43400 con 0x9571080
> 2013-02-20 04:38:01.047240 7f1091f25700  1 -- 192.168.254.132:6802/18444 <== osd.1 192.168.254.133:6801/21178 4989792 ==== osd_sub_op(client.9995.0:4402170 2.77 3b454af7/rb.0.1cf8.2ae8944a.000000000e35/head//2 [] v 790'353333 snapset=0=[]:[] snapc=0=[]) v7 ==== 1107+0+8799 (240510986 0 1818934932) 0x8f91400 con 0x9571080
> 2013-02-20 04:38:01.047600 7f108cf1b700  1 -- 192.168.254.132:6802/18444 --> osd.1 192.168.254.133:6801/21178 -- replica scrub(pg: 2.4e,from:0'0,to:0'0,epoch:790,start:6388b3ce//0//-1,end:cff8e3ce//0//-1,chunky:1,deep:0,version:4) v4 -- ?+0 0xb0b3b00
> 2013-02-20 04:38:01.050467 7f108df1d700  1 -- 192.168.254.132:6802/18444 --> osd.1 192.168.254.133:6801/21178 -- osd_sub_op(client.9811.0:498226 2.51 4d55cd51/rb.0.2150.2ae8944a.000000000628/head//2 [] v 790'271512 snapset=0=[]:[] snapc=0=[]) v7 -- ?+4703 0xd19d200
> 2013-02-20 04:38:01.060518 7f1091f25700  1 -- 192.168.254.132:6802/18444 <== osd.1 192.168.254.133:6801/21178 4989793 ==== osd_sub_op(client.9968.0:302420 2.74 2c5fc7f4/rb.0.1c64.74b0dc51.00000000043b/head//2 [] v 790'498230 snapset=0=[]:[] snapc=0=[]) v7 ==== 1107+0+16991 (2821939739 0 1047147434) 0xd19d200 con 0x9571080
> 2013-02-20 04:38:01.072310 7f1091f25700  1 -- 192.168.254.132:6802/18444 <== osd.1 192.168.254.133:6801/21178 4989794 ==== osd_sub_op_reply(client.9811.0:498226 2.51 4d55cd51/rb.0.2150.2ae8944a.000000000628/head//2 [] ondisk, result = 0) v1 ==== 157+0+0 (3589886011 0 0) 0xaf1ef00 con 0x9571080
> 2013-02-20 04:38:01.086276 7f1093f29700  1 -- 192.168.254.132:6802/18444 --> osd.1 192.168.254.133:6801/21178 -- osd_sub_op_reply(client.9995.0:4402170 2.77 3b454af7/rb.0.1cf8.2ae8944a.000000000e35/head//2 [] ondisk, result = 0) v1 -- ?+0 0xc477400
> 2013-02-20 04:38:01.090522 7f1091f25700  1 -- 192.168.254.132:6802/18444 <== osd.1 192.168.254.133:6801/21178 4989795 ==== osd_sub_op(client.9995.0:4402171 2.77 3b454af7/rb.0.1cf8.2ae8944a.000000000e35/head//2 [] v 790'353334 snapset=0=[]:[] snapc=0=[]) v7 ==== 1107+0+4703 (4105904555 0 2838991357) 0xbddea00 con 0x9571080
> 2013-02-20 04:38:01.097246 7f1093f29700  1 -- 192.168.254.132:6802/18444 --> osd.1 192.168.254.133:6801/21178 -- osd_sub_op_reply(client.9968.0:302420 2.74 2c5fc7f4/rb.0.1c64.74b0dc51.00000000043b/head//2 [] ondisk, result = 0) v1 -- ?+0 0x9055400
> 2013-02-20 04:38:01.100259 7f1091f25700  1 -- 192.168.254.132:6802/18444 <== osd.1 192.168.254.133:6801/21178 4989796 ==== osd_sub_op(client.9968.0:302421 2.74 2c5fc7f4/rb.0.1c64.74b0dc51.00000000043b/head//2 [] v 790'498231 snapset=0=[]:[] snapc=0=[]) v7 ==== 1107+0+4703 (3340239323 0 2873835958) 0x810f400 con 0x9571080
> 2013-02-20 04:38:01.110432 7f1093f29700  1 -- 192.168.254.132:6802/18444 --> osd.1 192.168.254.133:6801/21178 -- osd_sub_op_reply(client.9995.0:4402171 2.77 3b454af7/rb.0.1cf8.2ae8944a.000000000e35/head//2 [] ondisk, result = 0) v1 -- ?+0 0xc477680
> 2013-02-20 04:38:01.121128 7f1093f29700  1 -- 192.168.254.132:6802/18444 --> osd.1 192.168.254.133:6801/21178 -- osd_sub_op_reply(client.9968.0:302421 2.74 2c5fc7f4/rb.0.1c64.74b0dc51.00000000043b/head//2 [] ondisk, result = 0) v1 -- ?+0 0x296d680
> 2013-02-20 04:38:01.366189 7f108e71e700  1 -- 192.168.254.132:6802/18444 --> osd.1 192.168.254.133:6801/21178 -- osd_sub_op(client.10001.0:3423766 2.7d f598cffd/rb.0.209f.74b0dc51.000000000a00/head//2 [] v 790'768516 snapset=0=[]:[] snapc=0=[]) v7 -- ?+4703 0x9fe1400
> 2013-02-20 04:38:01.366377 7f108e71e700  1 -- 192.168.254.132:6802/18444 --> osd.1 192.168.254.133:6801/21178 -- osd_sub_op(client.10001.0:3423767 2.7d f598cffd/rb.0.209f.74b0dc51.000000000a00/head//2 [] v 790'768517 snapset=0=[]:[] snapc=0=[]) v7 -- ?+4703 0xb978a00
> 2013-02-20 04:38:01.366519 7f108e71e700  1 -- 192.168.254.132:6802/18444 --> osd.1 192.168.254.133:6801/21178 -- osd_sub_op(client.10001.0:3423768 2.7d f598cffd/rb.0.209f.74b0dc51.000000000a00/head//2 [] v 790'768518 snapset=0=[]:[] snapc=0=[]) v7 -- ?+4703 0x815d200
> 2013-02-20 04:38:01.366728 7f108df1d700  1 -- 192.168.254.132:6802/18444 --> osd.1 192.168.254.133:6801/21178 -- osd_sub_op(client.10001.0:3423770 2.7d f598cffd/rb.0.209f.74b0dc51.000000000a00/head//2 [] v 790'768519 snapset=0=[]:[] snapc=0=[]) v7 -- ?+4703 0xc557e00
> 2013-02-20 04:38:01.367681 7f1091f25700  1 -- 192.168.254.132:6802/18444 <== osd.1 192.168.254.133:6801/21178 4989797 ==== osd_sub_op(client.10001.0:3423769 2.17 37564117/rb.0.209f.74b0dc51.000000001200/head//2 [] v 790'353128 snapset=0=[]:[] snapc=0=[]) v7 ==== 1107+0+4703 (1225650648 0 1162937880) 0xa052800 con 0x9571080
> 2013-02-20 04:38:01.378938 7f1091f25700  1 -- 192.168.254.132:6802/18444 <== osd.1 192.168.254.133:6801/21178 4989798 ==== osd_sub_op_reply(client.10001.0:3423766 2.7d f598cffd/rb.0.209f.74b0dc51.000000000a00/head//2 [] ondisk, result = 0) v1 ==== 157+0+0 (2250676101 0 0) 0xc47768 0 con 0x9571080
> 2013-02-20 04:38:01.379109 7f1091f25700  1 -- 192.168.254.132:6802/18444 <== osd.1 192.168.254.133:6801/21178 4989799 ==== osd_sub_op_reply(client.10001.0:3423767 2.7d f598cffd/rb.0.209f.74b0dc51.000000000a00/head//2 [] ondisk, result = 0) v1 ==== 157+0+0 (2794751629 0 0) 0x7b1aa0 0 con 0x9571080
> 2013-02-20 04:38:01.379162 7f1091f25700  1 -- 192.168.254.132:6802/18444 <== osd.1 192.168.254.133:6801/21178 4989800 ==== osd_sub_op_reply(client.10001.0:3423768 2.7d f598cffd/rb.0.209f.74b0dc51.000000000a00/head//2 [] ondisk, result = 0) v1 ==== 157+0+0 (41013162 0 0) 0x8086a00 con 0x9571080
> 2013-02-20 04:38:01.379396 7f1091f25700  1 -- 192.168.254.132:6802/18444 <== osd.1 192.168.254.133:6801/21178 4989801 ==== osd_sub_op_reply(client.10001.0:3423770 2.7d f598cffd/rb.0.209f.74b0dc51.000000000a00/head//2 [] ondisk, result = 0) v1 ==== 157+0+0 (2912526559 0 0) 0xb05450 0 con 0x9571080
> 
> ----------------------------------------------------------------------
> osd.0 load and disk
> ----------------------------------------------------------------------
>                     load %user %nice  %sys  %iow  %stl %idle  dev rrqm/s wrqm/s    r/s    w/s    rkB/s    wkB/s arq-sz aqu-sz  await  rwait wwait %util
> 2013-02-20-04:37:56  0.6   1.4   0.0   1.4   0.5   0.0  94.1  sdn    0.0    0.0    0.0   19.1     0.00   182.00  19.09   0.59  30.87   0.00  30.9  14.5
> 2013-02-20-04:38:11  0.6   1.2   0.0   1.1   1.0   0.0  93.4  sdn    0.0    0.0    0.0   20.8     0.00   152.00  14.62   0.79  38.14   0.00  38.1  12.5
> 2013-02-20-04:38:26  0.6   1.3   0.0   1.1   0.5   0.0  94.0  sdn    0.0    0.0    0.0   17.4     0.00   170.53  19.60   0.69  39.85   0.00  39.8  14.7
> 2013-02-20-04:38:41  0.6   1.1   0.0   1.2   0.5   0.0  94.3  sdn    0.0    0.2    0.0   21.8     0.00   179.83  16.50   0.61  28.17   0.00  28.2  12.1
> 2013-02-20-04:38:56  0.8   1.7   0.0   1.2   4.3   0.0  90.6  sdn    0.0    0.1    0.2  141.1     1.60  1838.33  26.04  41.68 294.67 283.33 294.7  53.9
> 2013-02-20-04:39:11  0.7   0.3   0.0   0.3   0.3   0.0  98.3  sdn    0.0    0.0    0.0    2.3     0.00    45.20  39.88   0.16  85.00   0.00  85.0   4.8
> 2013-02-20-04:39:26  0.6   0.1   0.0   0.1   0.0   0.0  99.2  sdn    0.0    0.0    0.0    0.0     0.00     0.00   0.00   0.00   0.00   0.00   0.0   0.0
> 2013-02-20-04:39:41  0.5   0.4   0.0   0.3   0.1   0.0  99.4  sdn    0.0    0.0    0.0    0.0     0.00     0.00   0.00   0.00   0.00   0.00   0.0   0.0
> 2013-02-20-04:39:56  0.4   0.2   0.0   0.2   0.1   0.0  99.5  sdn    0.0    0.0    0.0    0.0     0.00     0.00   0.00   0.00   0.00   0.00   0.0   0.0
> 2013-02-20-04:40:11  0.3   0.3   0.0   0.3   0.0   0.0  99.5  sdn    0.0    0.0    0.0    0.0     0.00     0.00   0.00   0.00   0.00   0.00   0.0   0.0
> 
> ----------------------------------------------------------------------
> osd.1 load and disk
> ----------------------------------------------------------------------
>                     load %user %nice  %sys  %iow  %stl %idle  dev rrqm/s wrqm/s    r/s    w/s    rkB/s    wkB/s arq-sz aqu-sz  await  rwait wwait %util
> 2013-02-20-04:37:57  0.8   0.3   0.0   0.2   0.6   0.0  97.7  sdg    0.0    0.1    2.5   19.7    10.40   179.93  17.15   0.97  43.63  38.16  44.3  19.3
> 2013-02-20-04:38:12  1.0   0.3   0.0   0.3   0.7   0.0  97.4  sdg    0.0    0.1    2.8   28.0    11.47   232.20  15.82   0.71  23.20  25.48  23.0  16.1
> 2013-02-20-04:38:27  0.8   0.4   0.0   0.2   0.7   0.0  97.4  sdg    0.0    0.0    2.7   11.3    10.67   102.87  16.30   0.31  22.30  14.25  24.2  11.1
> 2013-02-20-04:38:42  0.7   0.2   0.0   0.2   0.8   0.0  97.6  sdg    0.0    0.0    2.3   22.3     9.07   198.07  16.84   1.15  46.72  31.18  48.3  20.9
> 2013-02-20-04:38:57  0.6   0.6   0.0   0.5   5.2   0.0  92.9  sdg    0.0    0.0   15.8  162.7    63.73  1613.37  18.79  32.51 182.07  24.94 197.3  63.1
> 2013-02-20-04:39:12  0.5   0.0   0.0   0.0   0.1   0.0  99.8  sdg    0.0    0.0    0.0    5.8     0.00    76.87  26.51   0.32  57.82   0.00  57.8   9.2
> 2013-02-20-04:39:27  0.4   0.0   0.0   0.0   0.1   0.0  99.9  sdg    0.0    0.0    0.0    0.0     0.00     0.00   0.00   0.00   0.00   0.00   0.0   0.0
> 2013-02-20-04:39:42  0.6   0.0   0.0   0.0   0.1   0.0  99.8  sdg    0.0    0.0    0.0    0.0     0.00     0.00   0.00   0.00   0.00   0.00   0.0   0.0
> 2013-02-20-04:39:57  0.5   0.0   0.0   0.0   0.1   0.0  99.8  sdg    0.0    0.0    0.0    0.0     0.00     0.00   0.00   0.00   0.00   0.00   0.0   0.0
> 2013-02-20-04:40:12  0.4   0.0   0.0   0.0   0.1   0.0  99.8  sdg    0.0    0.0    0.0    0.0     0.00     0.00   0.00   0.00   0.00   0.00   0.0   0.0
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux