Thanks - I got to that conclusion too eventually, after waiting for the recoveries to settle down.
Not sure how it happened, but one of the nodes running 6 of the OSDs, after moving to the 4.4.6 kernel started showing objects that were unfound, even though all copies were valid and other OSDs had good copies of the data.
What was strange was downing and outing the 'bad' OSDs would drop degraded and unfound objects to 0, and recoveries would continue; putting the OSDs back in would increase the unfound count randomly.
Once all shuffling was done, only 52 data objects and 0 metadata objects were unfound; i reverted them, and had the MDS drop sessions before starting - loaded right up and continued along.
I've removed that node from the cluster, and the rest seem ok after reverting to 3.17 kernels.
I'm assuming somehow the primaries for the PGs were on that node, they created corrupt objects somehow, and that propagated them to other OSDs, which were able to declare them actually bad. Not sure how, but, it's all back up and running at least.
I'll most likely be tearing it down and rebuilding it from scratch on 16.04 and 10.2.0; it's been running live since 2012 and online through all the version changes with very few outages. Wouldn't be a bad time to switch from hand-made to the ceph-deploy :D
Thanks again
On Mon, May 2, 2016 at 5:19 PM, John Spray <jspray@xxxxxxxxxx> wrote:
On Sun, May 1, 2016 at 2:34 AM, Russ <wernerru@xxxxxxx> wrote:
> After getting all the OSDs and MONs updated and running ok, I updated the
> MDS as usual; rebooted the machine after updating the kernel (we're on
> 14.04, but it was running an older 4.x kernel, so took it to 16.04's
> version), the MDS fails to come up. No replay, no nothing.
>
> It boots normally, and then stops while waiting for the journal to recover,
> just repeating the broadcasts:
>
> 2016-04-30 21:21:33.889536 7f9f85da3700 10 mds.beacon.a _send up:replay seq
> 59
> 2016-04-30 21:21:33.889576 7f9f85da3700 1 -- 35.8.224.77:6800/31903 -->
> 35.8.224.132:6789/0 -- mdsbeacon(15227404/a up:replay seq 59 v6030) v6 --
> ?+0 0x55a7d0a72000 con 0x55a7d0934600
> 2016-04-30 21:21:33.890646 7f9f88eaa700 1 -- 35.8.224.77:6800/31903 <==
> mon.1 35.8.224.132:6789/0 70 ==== mdsbeacon(15227404/a up:replay seq 59
> v6030) v6 ==== 125+0+0 (945447566 0 0) 0x55a7d0a74700 con 0x55a7d0934600
> 2016-04-30 21:21:33.890693 7f9f88eaa700 10 mds.beacon.a handle_mds_beacon
> up:replay seq 59 rtt 0.001135
>
> Journal never does anything, but upon killing the pid, it shows:
>
> 2016-04-30 21:21:40.455902 7f9f83b9d700 4 mds.0.log Journal 300 recovered.
> 2016-04-30 21:21:40.455929 7f9f83b9d700 0 mds.0.log Journal 300 is in
> unknown format 4294967295, does this MDS daemon require upgrade?
Hmm, think this might be misleading, Journaler::shutdown completes any
context waiting for journal recovery with status 0, which is causing
MDLog::_recovery_thread to proceed even though the journaler header
hasn't been populated properly (so had a bogus version).
It seems likely that there is a problem with your RADOS cluster that
is causing the MDS's read operations to stall while it tries to read
its journal. You could confirm this by starting the MDS, and then
while it is stuck using "ceph daemon mds.<name> objecter_requests" on
the MDS node to see what its outstanding operations are.
John
>
> Only reason the MDS got rebooted fully after the upgrades was that some
> random objects were showing unfound, yet if I shutdown one of the nodes
> housing those OSDs, the unfound count would reduce. Obviously need to deal
> with the MDS issue first haha.
>
> Hopefully someone has some insight as what can be ran to either get it back
> online as-was, nuke the journal (the metadata on-system should be ok, there
> wasn't any traffic of importance happening during the upgrades), or reset it
> so it'll pull from the metadata pool.
>
> Thanks!
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com