After getting all the OSDs and MONs updated and running ok, I updated the MDS as usual; rebooted the machine after updating the kernel (we're on 14.04, but it was running an older 4.x kernel, so took it to 16.04's version), the MDS fails to come up. No replay, no nothing.
It boots normally, and then stops while waiting for the journal to recover, just repeating the broadcasts:
2016-04-30 21:21:33.889536 7f9f85da3700 10 mds.beacon.a _send up:replay seq 59
2016-04-30 21:21:33.889576 7f9f85da3700 1 -- 35.8.224.77:6800/31903 --> 35.8.224.132:6789/0 -- mdsbeacon(15227404/a up:replay seq 59 v6030) v6 -- ?+0 0x55a7d0a72000 con 0x55a7d0934600
2016-04-30 21:21:33.890646 7f9f88eaa700 1 -- 35.8.224.77:6800/31903 <== mon.1 35.8.224.132:6789/0 70 ==== mdsbeacon(15227404/a up:replay seq 59 v6030) v6 ==== 125+0+0 (945447566 0 0) 0x55a7d0a74700 con 0x55a7d0934600
2016-04-30 21:21:33.890693 7f9f88eaa700 10 mds.beacon.a handle_mds_beacon up:replay seq 59 rtt 0.001135
Journal never does anything, but upon killing the pid, it shows:
2016-04-30 21:21:40.455902 7f9f83b9d700 4 mds.0.log Journal 300 recovered.
2016-04-30 21:21:40.455929 7f9f83b9d700 0 mds.0.log Journal 300 is in unknown format 4294967295, does this MDS daemon require upgrade?
Only reason the MDS got rebooted fully after the upgrades was that some random objects were showing unfound, yet if I shutdown one of the nodes housing those OSDs, the unfound count would reduce. Obviously need to deal with the MDS issue first haha.
Hopefully someone has some insight as what can be ran to either get it back online as-was, nuke the journal (the metadata on-system should be ok, there wasn't any traffic of importance happening during the upgrades), or reset it so it'll pull from the metadata pool.
Thanks!
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com