Re: OSD doesn't start

Székelyi Szabolcs <szekelyi@xxxxxxx> · Sun, 08 Jul 2012 20:51:38 +0200

On 2012. July 6. 01:33:13 Székelyi Szabolcs wrote:
> On 2012. July 5. 16:12:42 Székelyi Szabolcs wrote:
> > On 2012. July 4. 09:34:04 Gregory Farnum wrote:
> > > Hrm, it looks like the OSD data directory got a little busted somehow.
> > > How
> > > did you perform your upgrade? (That is, how did you kill your daemons,
> > > in
> > > what order, and when did you bring them back up.)
> > 
> > Since it would be hard and long to describe in text, I've collected the
> > relevant log entries, sorted by time at http://pastebin.com/Ev3M4DQ9 . The
> > short story is that after seeing that the OSDs won't start, I tried to
> > bring down the whole cluster and start it up from scratch. It didn't
> > change anything, so I rebooted the two machines (running all three
> > daemons), to see if it changes anything. It didn't and I gave up.
> > 
> > My ceph config is available at http://pastebin.com/KKNjmiWM .
> > 
> > Since this is my test cluster, I'm not very concerned about the data on
> > it.
> > But the other one, with the same config, is dying I think. ceph-fuse is
> > eating around 75% CPU on the sole monitor ("cc") node. The monitor about
> > 15%. On the other two nodes, the OSD eats around 50%, the MDS 15%, the
> > monitor another 10%. No Ceph filesystem activity is going on at the
> > moment.
> > Blktrace reports about 1kB/s disk traffic on the partition hosting the OSD
> > data dir. The data seems to be accessible at the moment, but I'm afraid
> > that my production cluster will end up in a similar situation after
> > upgrade, so I don't dare to touch it.
> > 
> > Do you have any suggestion what I should check?
> 
> Yes, it definitely looks like dying. Besides the above symptoms all clients'
> ceph-fuse burn the CPU, there are unreadable files on the fs (tar blocks on
> them infinitely), the FUSE clients emit messages like
> 
> ceph-fuse: 2012-07-05 23:21:41.583692 7f444dfd5700  0 -- client_ip:0/1181
> send_message dropped message ping v1 because of no pipe on con 0x1034000
> 
> every 5 seconds. I tried to backup the data on it, but it got blocked in the
> middle. Since then I'm unable to get any data out of it, not even by
> killing ceph-fuse and remounting the fs.

So it looks like the recent leap second caused all my troubles... After a 
colleague applied the workaround descibed here[0], the load on the nodes went 
back to normal, but the cluster was still sick. For example, stopping one of 
the monitors and looking at the output of `ceph -s`, it still showed all the 
monitors as up & running, whereas it was clear that at least one of them 
should have been marked down (there was no ceph-mon process there).

Finally I stopped the whole cluster (BTW `ceph stop` documented here[1] 
doesn't work any longer, it replies something like 'unrecognized subsystem'), 
rebooted all the nodes, and everything came up as it should have.

Cheers,
-- 
cc

[0] http://www.h-online.com/open/news/item/Leap-second-bug-in-Linux-wastes-
electricity-1631462.html
[1] http://ceph.com/docs/master/control/
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html