Re: Pgs stuck peering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 9 Feb 2013, Guilhem LETTRON wrote:
> Hi,
> I can confirm, I saw the same with 0.56.2 one week ago.

This sounds like the peering workqueue stuff... we just got to the bottom 
of that, and have a few patches pushed to the next and bobtail branches 
that fix this.  This should be released as 0.56.3 on Monday or Tuesday.  
In the meantime, you can run the autobuilt package from e.g.

	http://ceph.com/docs/master/install/debian/#development-testing-packages

Thanks!
sage




> 
> Guilhem Lettron
> Youscribe - www.youscribe.com
> 
> 
> On Sat, Feb 9, 2013 at 10:48 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
>       Hi,
> 
>       On 02/08/2013 11:55 PM, Jens Kristian S?gaard wrote:
>             Hi again,
> 
>             A followup to my last email:
> 
>             I restarted osd.6 and the system went back to
>             HEALTH_OK.
> 
> 
> FYI, I saw this with 0.56.2 as well, but didn't report it....
> 
>       I examined the logs of osd.6 and osd.12 around the time
>       the problem
>       occurred, and saw the following:
> 
>       ** On osd.12:
> 
>       2013-02-08 15:47:43.418226 7fa116ffd700  1 heartbeat_map
>       is_healthy
>       'OSD::op_tp thread 0x7fa114ff9700' had timed out after 15
>       [repeated many times]
>       2013-02-08 15:48:01.282623 7fa114ff9700  1 heartbeat_map
>       reset_timeout
>       'OSD::op_tp thread 0x7fa114ff9700' had timed out after 15
>       [repeated twice]
>       2013-02-08 15:48:09.898961 7fa11dffb700  0 log [WRN] : map
>       e3309 wrongly
>       marked me down
>       2013-02-08 15:49:56.496155 7fa116ffd700  1 heartbeat_map
>       is_healthy
>       'OSD::op_tp thread 0x7fa114ff9700' had timed out after 15
> 
>       This pattern repeats itself once more.
> 
> 
> I haven't examined the logs at that point, restarting the OSD fixed
> it, but just wanted to report I saw the same.
> 
> Probably a coincidence, but I saw it on a 12 OSD system as well.
> 
> Wido
> 
>       Then I have messages like this:
> 
>       2013-02-08 15:50:59.814871 7fa11c6f7700  0 --
>       10.0.0.1:6807/29923 >>
>       10.0.0.2:6807/10808 pipe(0x7fa0fd11a5d0 sd=36 :41003 s=2
>       pgs=288 cs=3
>       l=0).reader got old message 1 <= 41 0x7fa124420da0
>       osd_map(3319..3322
>       src has 2819..3322) v3, discarding
>       2013-02-08 15:50:59.814899 7fa1107e9700  0 --
>       10.0.0.1:6807/29923 >>
>       10.0.0.2:6819/11582 pipe(0x7fa10c7d07d0 sd=37 :37261 s=2
>       pgs=270 cs=3
>       l=0).reader got old message 1 <= 51 0x7fa0b0452970
>       osd_map(3319..3322
>       src has 2819..3322) v3, discarding
>       2013-02-08 15:50:59.814946 7fa11c6f7700  0 --
>       10.0.0.1:6807/29923 >>
>       10.0.0.2:6807/10808 pipe(0x7fa0fd11a5d0 sd=36 :41003 s=2
>       pgs=288 cs=3
>       l=0).fault with nothing to send, going to standby
>       2013-02-08 15:50:59.815062 7fa0a32f2700  0 --
>       10.0.0.1:6807/29923 >>
>       10.0.0.2:6801/14104 pipe(0x7fa10d4b82d0 sd=43 :50456 s=2
>       pgs=240 cs=3
>       l=0).reader got old message 1 <= 44 0x7fa12c00e2a0
>       osd_map(3319..3322
>       src has 2819..3322) v3, discarding
>       2013-02-08 15:50:59.815109 7fa1107e9700  0 --
>       10.0.0.1:6807/29923 >>
>       10.0.0.2:6819/11582 pipe(0x7fa10c7d07d0 sd=37 :37261 s=2
>       pgs=270 cs=3
>       l=0).fault, initiating reconnect
> 
> 
>       ** On osd.6:
> 
>       2013-02-08 15:48:03.716412 7ffebe2ab700 -1 osd.6 3308
>       heartbeat_check:
>       no reply from osd.12 since 2013-02-08 15:47:42.725323
>       (cutoff 2013-02-08
>       15:47:43.716409)
>       [repeated 7 times]
>       2013-02-08 15:50:59.812548 7ffea3fff700  0 osd.6 3322 from
>       dead osd.12,
>       dropping, sharing map
>       [repeated 7 times]
> 
>       Then I have messages like this:
> 
>       2013-02-08 15:51:00.126043 7ffe966ed700  0 --
>       10.0.0.2:6807/10808 >>
>       10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=0
>       pgs=0 cs=0
>       l=0).accept connect_seq 0 vs existing 65 state standby
>       2013-02-08 15:51:00.126054 7ffe966ed700  0 --
>       10.0.0.2:6807/10808 >>
>       10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=0
>       pgs=0 cs=0
>       l=0).accept peer reset, then tried to connect to us,
>       replacing
>       2013-02-08 15:51:00.126929 7ffe966ed700  0 --
>       10.0.0.2:6807/10808 >>
>       10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2
>       pgs=275 cs=1
>       l=0).reader got old message 1 <= 196030 0x7ffe60001230
>       pg_info(1 pgs
>       e3319:0.9d) v3, discarding
>       2013-02-08 15:51:00.127083 7ffe966ed700  0 --
>       10.0.0.2:6807/10808 >>
>       10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2
>       pgs=275 cs=1
>       l=0).reader got old message 2 <= 196030 0x7ffe60001230
>       pg_info(1 pgs
>       e3319:4.99) v3, discarding
>       2013-02-08 15:51:00.127178 7ffe966ed700  0 --
>       10.0.0.2:6807/10808 >>
>       10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2
>       pgs=275 cs=1
>       l=0).reader got old message 3 <= 196030 0x7ffe60001ab0
>       pg_info(1 pgs
>       e3319:0.85) v3, discarding
>       2013-02-08 15:51:00.127273 7ffe966ed700  0 --
>       10.0.0.2:6807/10808 >>
>       10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2
>       pgs=275 cs=1
>       l=0).reader got old message 4 <= 196030 0x7ffe60001300
>       pg_info(1 pgs
>       e3319:4.81) v3, discarding
>       2013-02-08 15:51:33.840234 7ffe966ed700  0 --
>       10.0.0.2:6807/10808 >>
>       10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2
>       pgs=275 cs=1
>       l=0).reader got old message 5 <= 196030 0x7ffe60001ca0
>       osd_map(3319..3323 src has 2822..3323) v3, discarding
>       2013-02-08 15:51:33.840487 7ffe966ed700  0 --
>       10.0.0.2:6807/10808 >>
>       10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2
>       pgs=275 cs=1
>       l=0).reader got old message 6 <= 196030 0x7ffe600010d0
>       pg_notify(0.12(14),2.10(9),1.11(9),4.e(9) epoch 3323) v4,
>       discarding
>       2013-02-08 15:51:33.841834 7ffe966ed700  0 --
>       10.0.0.2:6807/10808 >>
>       10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2
>       pgs=275 cs=1
>       l=0).reader got old message 7 <= 196030 0x7ffe600010d0
>       pg_query(0.43,0.85,0.9d,1.42,1.84,1.9c,2.41,2.83,2.9b,4.3f,4.81,4.99
>       epoch 33
>       23) v2, discarding
>       2013-02-08 15:51:34.165219 7ffe966ed700  0 --
>       10.0.0.2:6807/10808 >>
>       10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2
>       pgs=275 cs=1
>       l=0).reader got old message 8 <= 196030 0x7ffe6001c630
>       pg_notify(0.12(14),1.11(9),2.10(9),4.e(9) epoch 3323) v4,
>       discarding
>       2013-02-08 15:51:36.805662 7ffe966ed700  0 --
>       10.0.0.2:6807/10808 >>
>       10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2
>       pgs=275 cs=1
>       l=0).reader got old message 9 <= 196030 0x7ffe60021280
>       pg_log(2.41 epoch
>       3324 query_epoch 3324) v3, discarding
>       2013-02-08 15:51:36.805764 7ffe966ed700  0 --
>       10.0.0.2:6807/10808 >>
>       10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2
>       pgs=275 cs=1
>       l=0).reader got old message 10 <= 196030 0x7ffe60021280
>       pg_log(1.42
>       epoch 3324 query_epoch 3324) v3, discarding
>       2013-02-08 15:51:39.404585 7ffe966ed700  0 --
>       10.0.0.2:6807/10808 >>
>       10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2
>       pgs=275 cs=1
>       l=0).reader got old message 11 <= 196030 0x7ffe60021280
>       pg_log(1.9c
>       epoch 3324 query_epoch 3324) v3, discarding
>       2013-02-08 15:51:39.404674 7ffe966ed700  0 --
>       10.0.0.2:6807/10808 >>
>       10.0.0.1:6804/29923 pipe(0x7ffe6c002820 sd=30 :6807 s=2
>       pgs=275 cs=1
>       l=0).reader got old message 12 <= 196030 0x7ffe60021280
>       pg_log(2.9b
>       epoch 3324 query_epoch 3324) v3, discarding
> 
> 
> 
> 
> --
> Wido den Hollander
> 42on B.V.
> 
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux