Re: Pgs stuck peering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi again,

A followup to my last email:

I restarted osd.6 and the system went back to HEALTH_OK.

I examined the logs of osd.6 and osd.12 around the time the problem occurred, and saw the following:

** On osd.12:

2013-02-08 15:47:43.418226 7fa116ffd700 1 heartbeat_map is_healthy 'OSD::op_tp thread 0x7fa114ff9700' had timed out after 15
[repeated many times]
2013-02-08 15:48:01.282623 7fa114ff9700 1 heartbeat_map reset_timeout 'OSD::op_tp thread 0x7fa114ff9700' had timed out after 15
[repeated twice]
2013-02-08 15:48:09.898961 7fa11dffb700 0 log [WRN] : map e3309 wrongly marked me down 2013-02-08 15:49:56.496155 7fa116ffd700 1 heartbeat_map is_healthy 'OSD::op_tp thread 0x7fa114ff9700' had timed out after 15

This pattern repeats itself once more.

Then I have messages like this:

2013-02-08 15:50:59.814871 7fa11c6f7700 0 -- 10.0.0.1:6807/29923 >> 10.0.0.2:6807/10808 pipe(0x7fa0fd11a5d0 sd=36 :41003 s=2 pgs=288 cs=3 l=0).reader got old message 1 <= 41 0x7fa124420da0 osd_map(3319..3322 src has 2819..3322) v3, discarding 2013-02-08 15:50:59.814899 7fa1107e9700 0 -- 10.0.0.1:6807/29923 >> 10.0.0.2:6819/11582 pipe(0x7fa10c7d07d0 sd=37 :37261 s=2 pgs=270 cs=3 l=0).reader got old message 1 <= 51 0x7fa0b0452970 osd_map(3319..3322 src has 2819..3322) v3, discarding 2013-02-08 15:50:59.814946 7fa11c6f7700 0 -- 10.0.0.1:6807/29923 >> 10.0.0.2:6807/10808 pipe(0x7fa0fd11a5d0 sd=36 :41003 s=2 pgs=288 cs=3 l=0).fault with nothing to send, going to standby 2013-02-08 15:50:59.815062 7fa0a32f2700 0 -- 10.0.0.1:6807/29923 >> 10.0.0.2:6801/14104 pipe(0x7fa10d4b82d0 sd=43 :50456 s=2 pgs=240 cs=3 l=0).reader got old message 1 <= 44 0x7fa12c00e2a0 osd_map(3319..3322 src has 2819..3322) v3, discarding 2013-02-08 15:50:59.815109 7fa1107e9700 0 -- 10.0.0.1:6807/29923 >> 10.0.0.2:6819/11582 pipe(0x7fa10c7d07d0 sd=37 :37261 s=2 pgs=270 cs=3 l=0).fault, initiating reconnect


** On osd.6:

2013-02-08 15:48:03.716412 7ffebe2ab700 -1 osd.6 3308 heartbeat_check: no reply from osd.12 since 2013-02-08 15:47:42.725323 (cutoff 2013-02-08 15:47:43.716409)
[repeated 7 times]
2013-02-08 15:50:59.812548 7ffea3fff700 0 osd.6 3322 from dead osd.12, dropping, sharing map
[repeated 7 times]

Then I have messages like this:

2013-02-08 15:51:00.126043 7ffe966ed700 0 -- 10.0.0.2:6807/10808 >> 10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=0 pgs=0 cs=0 l=0).accept connect_seq 0 vs existing 65 state standby 2013-02-08 15:51:00.126054 7ffe966ed700 0 -- 10.0.0.2:6807/10808 >> 10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=0 pgs=0 cs=0 l=0).accept peer reset, then tried to connect to us, replacing 2013-02-08 15:51:00.126929 7ffe966ed700 0 -- 10.0.0.2:6807/10808 >> 10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2 pgs=275 cs=1 l=0).reader got old message 1 <= 196030 0x7ffe60001230 pg_info(1 pgs e3319:0.9d) v3, discarding 2013-02-08 15:51:00.127083 7ffe966ed700 0 -- 10.0.0.2:6807/10808 >> 10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2 pgs=275 cs=1 l=0).reader got old message 2 <= 196030 0x7ffe60001230 pg_info(1 pgs e3319:4.99) v3, discarding 2013-02-08 15:51:00.127178 7ffe966ed700 0 -- 10.0.0.2:6807/10808 >> 10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2 pgs=275 cs=1 l=0).reader got old message 3 <= 196030 0x7ffe60001ab0 pg_info(1 pgs e3319:0.85) v3, discarding 2013-02-08 15:51:00.127273 7ffe966ed700 0 -- 10.0.0.2:6807/10808 >> 10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2 pgs=275 cs=1 l=0).reader got old message 4 <= 196030 0x7ffe60001300 pg_info(1 pgs e3319:4.81) v3, discarding 2013-02-08 15:51:33.840234 7ffe966ed700 0 -- 10.0.0.2:6807/10808 >> 10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2 pgs=275 cs=1 l=0).reader got old message 5 <= 196030 0x7ffe60001ca0 osd_map(3319..3323 src has 2822..3323) v3, discarding 2013-02-08 15:51:33.840487 7ffe966ed700 0 -- 10.0.0.2:6807/10808 >> 10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2 pgs=275 cs=1 l=0).reader got old message 6 <= 196030 0x7ffe600010d0 pg_notify(0.12(14),2.10(9),1.11(9),4.e(9) epoch 3323) v4, discarding 2013-02-08 15:51:33.841834 7ffe966ed700 0 -- 10.0.0.2:6807/10808 >> 10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2 pgs=275 cs=1 l=0).reader got old message 7 <= 196030 0x7ffe600010d0 pg_query(0.43,0.85,0.9d,1.42,1.84,1.9c,2.41,2.83,2.9b,4.3f,4.81,4.99 epoch 33
23) v2, discarding
2013-02-08 15:51:34.165219 7ffe966ed700 0 -- 10.0.0.2:6807/10808 >> 10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2 pgs=275 cs=1 l=0).reader got old message 8 <= 196030 0x7ffe6001c630 pg_notify(0.12(14),1.11(9),2.10(9),4.e(9) epoch 3323) v4, discarding 2013-02-08 15:51:36.805662 7ffe966ed700 0 -- 10.0.0.2:6807/10808 >> 10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2 pgs=275 cs=1 l=0).reader got old message 9 <= 196030 0x7ffe60021280 pg_log(2.41 epoch 3324 query_epoch 3324) v3, discarding 2013-02-08 15:51:36.805764 7ffe966ed700 0 -- 10.0.0.2:6807/10808 >> 10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2 pgs=275 cs=1 l=0).reader got old message 10 <= 196030 0x7ffe60021280 pg_log(1.42 epoch 3324 query_epoch 3324) v3, discarding 2013-02-08 15:51:39.404585 7ffe966ed700 0 -- 10.0.0.2:6807/10808 >> 10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2 pgs=275 cs=1 l=0).reader got old message 11 <= 196030 0x7ffe60021280 pg_log(1.9c epoch 3324 query_epoch 3324) v3, discarding 2013-02-08 15:51:39.404674 7ffe966ed700 0 -- 10.0.0.2:6807/10808 >> 10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2 pgs=275 cs=1 l=0).reader got old message 12 <= 196030 0x7ffe60021280 pg_log(2.9b epoch 3324 query_epoch 3324) v3, discarding


--
Jens Kristian Søgaard, Mermaid Consulting ApS,
jens@xxxxxxxxxxxxxxxxxxxx,
http://www.mermaidconsulting.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux