On Mon, Oct 3, 2016 at 6:29 AM, Adam Tygart <mozes@xxxxxxx> wrote: > I put this in the #ceph-dev on Friday, > > (gdb) print info > $7 = (const MDSMap::mds_info_t &) @0x55555fb1da68: { > global_id = {<boost::totally_ordered1<mds_gid_t, > boost::totally_ordered2<mds_gid_t, unsigned long, > boost::detail::empty_base<mds_gid_t> > >> = > {<boost::less_than_comparable1<mds_gid_t, > boost::equality_comparable1<mds_gid_t, > boost::totally_ordered2<mds_gid_t, unsigned long, > boost::detail::empty_base<mds_gid_t> > > >> = > {<boost::equality_comparable1<mds_gid_t, > boost::totally_ordered2<mds_gid_t, unsigned long, > boost::detail::empty_base<mds_gid_t> > >> = > {<boost::totally_ordered2<mds_gid_t, unsigned long, > boost::detail::empty_base<mds_gid_t> >> = > {<boost::less_than_comparable2<mds_gid_t, unsigned long, > boost::equality_comparable2<mds_gid_t, unsigned long, > boost::detail::empty_base<mds_gid_t> > >> = > {<boost::equality_comparable2<mds_gid_t, unsigned long, > boost::detail::empty_base<mds_gid_t> >> = > {<boost::detail::empty_base<mds_gid_t>> = {<No data fields>}, <No data > fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No > data fields>}, <No data fields>}, t = 1055992652}, name = "mormo", > rank = -1, inc = 0, > state = MDSMap::STATE_STANDBY, state_seq = 2, addr = {type = 0, > nonce = 8835, {addr = {ss_family = 2, __ss_align = 0, __ss_padding = > '\000' <repeats 111 times>}, addr4 = {sin_family = 2, sin_port = > 36890, > sin_addr = {s_addr = 50398474}, sin_zero = > "\000\000\000\000\000\000\000"}, addr6 = {sin6_family = 2, sin6_port = > 36890, sin6_flowinfo = 50398474, sin6_addr = {__in6_u = { > __u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, > 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = > 0}}}, laggy_since = {tv = {tv_sec = 0, tv_nsec = 0}}, > standby_for_rank = 0, standby_for_name = "", standby_for_fscid = > 328, standby_replay = true, export_targets = std::set with 0 elements, > mds_features = 1967095022025} > (gdb) print target_role > $8 = {rank = 0, fscid = <optimized out>} > > It looks like target_role.fscid was somehow optimized out. Thanks for this, let's switch discussion to the ticket (I think I know what's wrong now). John > > -- > Adam > > On Sun, Oct 2, 2016 at 4:26 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: >> On Sat, Oct 1, 2016 at 7:19 PM, Adam Tygart <mozes@xxxxxxx> wrote: >>> The wip-fixup-mds-standby-init branch doesn't seem to allow the >>> ceph-mons to start up correctly. I disabled all mds servers before >>> starting the monitors up, so it would seem the pending mdsmap update >>> is in durable storage. Now that the mds servers are down, can we clear >>> the mdsmap of active and standby servers while initializing the mons? >>> I would hope that, now that all the versions are in sync, a bad >>> standby_for_fscid would not be possible with new mds servers starting. >> >> Looks like my first guess about the run-time initialization being >> confused was wrong. :( >> Given that, we're pretty befuddled. But I commented on irc: >> >>>if you've still got a core dump, can you go up a frame (to MDSMonitor::maybe_promote_standby) and check the values of target_role.rank and target_role.fscid, and how that compares to info.standby_for_fscid, info.legacy_client_fscid, and info.standby_for_rank? >> >> That might pop up something and isn't accessible in the log you >> posted. We also can't see an osdmap or dump; if you could either >> extract and print that or get a log which includes it that might show >> up something. >> >> I don't think we changed the mds<-> protocol or anything in the point >> releases, so the different package version *shouldn't* matter...right, >> John? ;) >> -Greg >> >>> >>> -- >>> Adam >>> >>> On Fri, Sep 30, 2016 at 3:49 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: >>>> On Fri, Sep 30, 2016 at 11:39 AM, Adam Tygart <mozes@xxxxxxx> wrote: >>>>> Hello all, >>>>> >>>>> Not sure if this went through before or not, as I can't check the >>>>> mailing list archives. >>>>> >>>>> I've gotten myself into a bit of a bind. I was prepping to add a new >>>>> mds node to my ceph cluster. e.g. ceph-deploy mds create mormo >>>>> >>>>> Unfortunately, it started the mds server before I was ready. My >>>>> cluster was running 10.2.1, and the newly deployed mds is 10.2.3. >>>>> >>>>> This caused 3 of my 5 monitors to crash. Since I immediately realized >>>>> the mds was a newer version, I took that opportunity to upgrade my >>>>> monitors to 10.2.3. Three of the 5 monitors continue to crash. And it >>>>> looks like they are crashing when trying to apply a pending mdsmap >>>>> update. >>>>> >>>>> The log is available here: >>>>> http://people.cis.ksu.edu/~mozes/hobbit01.mon-20160930.log.gz >>>>> >>>>> I have attempted (making backups of course) to extract the monmap from >>>>> a working monitor and inserting it into a broken one. No luck, and >>>>> backup was restored. >>>>> >>>>> Since I had 2 working monitors, I backed up the monitor stores, >>>>> updated the monmaps to remove the broken ones and tried to restart >>>>> them. I then tried to restart the "working" ones. They then failed in >>>>> the same way. I've now restored my backups of those monitors. >>>>> >>>>> I need to get these monitors back up post-haste. >>>>> >>>>> If you've got any ideas, I would be grateful. >>>> >>>> I'm not sure but it looks like it's now too late to keep the problem >>>> out of the durable storage, but if you try again make sure you turn >>>> off the MDS first. >>>> >>>> It sort of looks like you've managed to get a failed MDS with an >>>> invalid fscid (ie, a cephfs filesystem ID). >>>> >>>> ...or maybe just a terrible coding mistake. As mentioned on irc, >>>> wip-fixup-mds-standby-init should fix it. I've created a ticket as >>>> well: http://tracker.ceph.com/issues/17466 >>>> -Greg >>>> >>>> >>>>> >>>>> -- >>>>> Adam >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com