Can you start up your mds with "dedug mds = 20" and "debug ms = 20"? The "failed to decode message" line is suspicious but there's not enough context here for me to be sure, and my pattern-matching isn't reminding me of any serious bugs. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Thu, Aug 29, 2013 at 3:10 AM, Serge Slipchenko <serge.slipchenko@xxxxxxxxxxxx> wrote: > Hi, > > I upgraded Ceph from Bobtail to Cuttlefish and everything seemed good. > Then I started to write to cephfs, but at some moment write stalled. > After that I'm not able to mount either with kernel driver, or with > custom utility. > > ceph -s shows that everything is good. > > health HEALTH_OK > monmap e2: 2 mons at {m01=5.9.118.83:6789/0,m02=5.9.122.115:6789/0}, > election epoch 1320, quorum 0,1 m01,m02 > osdmap e3967: 16 osds: 16 up, 16 in > pgmap v1315932: 256 pgs: 255 active+clean, 1 active+clean+scrubbing; 215 > GB data, 448 GB used, 38441 GB / 40971 GB avail; 37585KB/s rd, 1op/s > mdsmap e774: 1/1/1 up {0=m02=up:active}, 1 up:standby > > But in the mds.a log I see the following messages: > > 2013-08-29 10:06:34.371166 7f49e68aa700 0 -- 5.9.122.115:6807/1077 >> > 91.193.166.194:0/2272475298 pipe(0x8de3780 sd=74 :6807 s=0 pgs=0 cs=0 > l=0).accept peer addr is really 91.193.166.194:0/2272475298 (socket is > 91.193.166.194:56649/0) > 2013-08-29 10:07:38.454659 7f49e68aa700 0 -- 5.9.122.115:6807/1077 >> > 91.193.166.194:0/2272475298 pipe(0x8de3780 sd=74 :6807 s=2 pgs=2 cs=1 > l=0).fault, server, going to standby > 2013-08-29 10:23:06.898089 7f49e60a2700 0 -- 5.9.122.115:6807/1077 >> > 91.193.166.194:0/3930317661 pipe(0x7442c000 sd=78 :6807 s=0 pgs=0 cs=0 > l=0).accept peer addr is really 91.193.166.194:0/3930317661 (socket is > 91.193.166.194:56272/0) > 2013-08-29 10:24:07.384136 7f49e60a2700 0 -- 5.9.122.115:6807/1077 >> > 91.193.166.194:0/3930317661 pipe(0x7442c000 sd=78 :6807 s=2 pgs=2 cs=1 > l=0).fault, server, going to standby > 2013-08-29 10:30:21.177807 7f49e5c9e700 0 -- 5.9.122.115:6807/1077 >> > 91.193.166.194:0/1838286378 pipe(0x73bd8a00 sd=80 :6807 s=0 pgs=0 cs=0 > l=0).accept peer addr is really 91.193.166.194:0/1838286378 (socket is > 91.193.166.194:59069/0) > 2013-08-29 10:31:21.300004 7f49e5c9e700 0 -- 5.9.122.115:6807/1077 >> > 91.193.166.194:0/1838286378 pipe(0x73bd8a00 sd=80 :6807 s=2 pgs=2 cs=1 > l=0).fault, server, going to standby > 2013-08-29 11:17:17.331613 7f040de6b700 0 -- 5.9.122.115:6807/7622 >> > 91.193.166.194:0/2689145238 pipe(0x13ea780 sd=34 :6807 s=2 pgs=2 cs=1 > l=0).fault with nothing to send, going to standby > 2013-08-29 11:22:08.137711 7f0411897700 0 log [INF] : closing stale > session client.76201 91.193.166.194:0/2689145238 after 304.270364 > > And mds.b outputs a lot of: > > 2013-08-29 12:04:58.743938 7fa75604d700 -1 failed to decode message of > type 23 v2: buffer::end_of_buffer > 2013-08-29 12:04:58.743969 7fa75604d700 0 -- 5.9.122.115:6800/977 >> > 144.76.13.103:0/925435369 pipe(0x524e780 sd=39 :6800 s=2 pgs=130763 > cs=12829 l=0).fault with nothing to send, going to standby > 2013-08-29 12:04:58.744236 7fa755f4c700 0 -- 5.9.122.115:6800/977 >> > 144.76.13.102:0/2955281877 pipe(0x524e500 sd=37 :6800 s=0 pgs=0 cs=0 > l=0).accept connect_seq 12834 vs existing 12833 state standby > 2013-08-29 12:04:58.744607 7fa756754700 0 -- 5.9.122.115:6800/977 >> > 144.76.13.105:0/347604456 pipe(0x52c5a00 sd=38 :6800 s=0 pgs=0 cs=0 > l=0).accept connect_seq 12538 vs existing 12537 state standby > 2013-08-29 12:04:58.744627 7fa755f4c700 -1 failed to decode message of > type 23 v2: buffer::end_of_buffer > 2013-08-29 12:04:58.744671 7fa755f4c700 0 -- 5.9.122.115:6800/977 >> > 144.76.13.102:0/2955281877 pipe(0x524e500 sd=37 :6800 s=2 pgs=292532 > cs=12835 l=0).fault with nothing to send, going to standby > 2013-08-29 12:04:58.745006 7fa75614e700 0 -- 5.9.122.115:6800/977 >> > 144.76.13.103:0/925435369 pipe(0x52c5780 sd=31 :6800 s=0 pgs=0 cs=0 > l=0).accept connect_seq 12830 vs existing 12829 state standby > 2013-08-29 12:04:58.745102 7fa756754700 -1 failed to decode message of > type 23 v2: buffer::end_of_buffer > 2013-08-29 12:04:58.745146 7fa756754700 0 -- 5.9.122.115:6800/977 >> > 144.76.13.105:0/347604456 pipe(0x52c5a00 sd=38 :6800 s=2 pgs=131368 > cs=12539 l=0).fault with nothing to send, going to standby > > > -- > Kind regards, Serge Slipchenko > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com