On 2012. July 5. 16:12:42 Székelyi Szabolcs wrote: > On 2012. July 4. 09:34:04 Gregory Farnum wrote: > > Hrm, it looks like the OSD data directory got a little busted somehow. How > > did you perform your upgrade? (That is, how did you kill your daemons, in > > what order, and when did you bring them back up.) > > Since it would be hard and long to describe in text, I've collected the > relevant log entries, sorted by time at http://pastebin.com/Ev3M4DQ9 . The > short story is that after seeing that the OSDs won't start, I tried to bring > down the whole cluster and start it up from scratch. It didn't change > anything, so I rebooted the two machines (running all three daemons), to > see if it changes anything. It didn't and I gave up. > > My ceph config is available at http://pastebin.com/KKNjmiWM . > > Since this is my test cluster, I'm not very concerned about the data on it. > But the other one, with the same config, is dying I think. ceph-fuse is > eating around 75% CPU on the sole monitor ("cc") node. The monitor about > 15%. On the other two nodes, the OSD eats around 50%, the MDS 15%, the > monitor another 10%. No Ceph filesystem activity is going on at the moment. > Blktrace reports about 1kB/s disk traffic on the partition hosting the OSD > data dir. The data seems to be accessible at the moment, but I'm afraid > that my production cluster will end up in a similar situation after > upgrade, so I don't dare to touch it. > > Do you have any suggestion what I should check? Yes, it definitely looks like dying. Besides the above symptoms all clients' ceph-fuse burn the CPU, there are unreadable files on the fs (tar blocks on them infinitely), the FUSE clients emit messages like ceph-fuse: 2012-07-05 23:21:41.583692 7f444dfd5700 0 -- client_ip:0/1181 send_message dropped message ping v1 because of no pipe on con 0x1034000 every 5 seconds. I tried to backup the data on it, but it got blocked in the middle. Since then I'm unable to get any data out of it, not even by killing ceph-fuse and remounting the fs. -- cc -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html