Re: [Ceph-community] After Mimic upgrade OSD's stuck at booting.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Sage!

Thanks for great support! You have saved me from a great trouble :)

The failure reason was that monitors epoch number is much different
than the osds. We have rebuild the store.db with
ceph-objectstore-tool. I will post details later.

And the reason of this problem seems to be my distro Archlinux. As far
as we analyze the way of Arch's package building might cause this
problem.

Thanks again Sage! Ceph saved my day!
Sage Weil <sage@xxxxxxxxxxxx>, 24 Eyl 2018 Pzt, 04:43 tarihinde şunu yazdı:
>
> Some of the mons only have debug_ms=1 and not debug_mon=20, so I still
> can't find an instance where it has logged the mon processing an osd_boot
> message. Can you set debug_mon=20 and debug_ms=1 on all mons, restart all
> mons, and then restart several OSDs, so that we capture one?
>
> Thanks
> s
>
> On Mon, 24 Sep 2018, by morphin wrote:
>
> > Hi Sage,
> >
> > Thanks for your help!
> >
> > I am really desperate here. It is less then 6 hours for the day starts here.
> > 3mons is the attached log of all mons (all there).
> > SEKUARK1 is the log of mon after restarting the osd.0.
> >
> > Hope this helps!
> >
> > Best Regards,
> >
> > ceph -w https://paste.ubuntu.com/p/HkscWdbWWW/
> > ceph osd dump  https://paste.ubuntu.com/p/rXwxZRyNXC/
> >
> > MON's + OSD log:
> > https://www.dropbox.com/sh/g0o2eaw5zh2lccf/AADCz_ClkTl7UCHjVwRIYYiKa?dl=0
> >
> >
> >
> > Sage Weil <sage@xxxxxxxxxxxx>, 24 Eyl 2018 Pzt, 00:29 tarihinde şunu yazdı:
> > >
> > > It looks like the OSD is sending the boot message but the mon is not
> > > marking it up.  Can you attach the output of 'ceph osd dump'?  Also, can
> > > you restart an OSD after the mon debug levels are turned up and then
> > > attach the mon logs?  I don't see it processing any osd_boot messages.
> > >
> > > (And add debug ms = 1 on the mons)
> > >
> > > Thanks!
> > > s
> > >
> > > On Sun, 23 Sep 2018, by morphin wrote:
> > >
> > > > I collect more logs for you.
> > > > I started 2 osd (osd8 on A DC, and osd156 on B DC) with -debug-osd=20;
> > > >
> > > > OSD8: https://www.dropbox.com/s/5e01f5odtsq3iqi/ceph-osd.8.log?dl=0
> > > > OSD156: https://www.dropbox.com/s/ox7or2uizyiwdo7/ceph-osd.156.log?dl=0
> > > >
> > > > ceph osd stat 168 osds: 0 up, 168 in; epoch: e37506
> > > > Ceph -w https://paste.ubuntu.com/p/pRhPKvjqJK/
> > > >
> > > > by morphin <morphinwithyou@xxxxxxxxx>, 23 Eyl 2018 Paz, 17:25
> > > > tarihinde şunu yazdı:
> > > > >
> > > > > I tried but I couldn't find a clear shoot from that.
> > > > >
> > > > > OSD: https://paste.ubuntu.com/p/P79fHxTv2G/
> > > > > MON: https://paste.ubuntu.com/p/yRnG9DwWpq/
> > > > > David Conisbee <davidconisbee@xxxxxxxxx>, 23 Eyl 2018 Paz, 16:54
> > > > > tarihinde şunu yazdı:
> > > > > >
> > > > > > Have you tried debug osd = 5/5 in your ceph.conf to get more logging?
> > > > > >
> > > > > > On Sun, 23 Sep 2018, 11:41 morph in, <morphinwithyou@xxxxxxxxx> wrote:
> > > > > >>
> > > > > >> Hello again.
> > > > > >>
> > > > > >> I'm sending 2nd mail because my problem is very urgent. I'd be very
> > > > > >> grateful if somebody helps.
> > > > > >>
> > > > > >> After Luminous to Mimic upgrade when I try to start an OSD. Its
> > > > > >> stucking at "booting". (I edit the hostnames so do not care if they're
> > > > > >> not identical.)
> > > > > >>
> > > > > >> OSD log: https://paste.ubuntu.com/p/hFhc2dkSqb/
> > > > > >> MON log: https://paste.ubuntu.com/p/F85mYwvP4C/
> > > > > >> MGR log: https://paste.ubuntu.com/p/jYQ5kJstnH/
> > > > > >> CEPH.conf https://paste.ubuntu.com/p/qDwjzdsmGK/
> > > > > >> Telnet OSD to MON: https://paste.ubuntu.com/p/fbn9hTWv8q/
> > > > > >>
> > > > > >> I upgraded the system with this order:
> > > > > >>
> > > > > >> 1- Stop MDS ->OSD's -> MGR -> MON -> Servers
> > > > > >> 2- Upgrade OS image 4.14.30-1-lts to --> 4.14.70-1-lts "Ceph,kernel etc"
> > > > > >> 3- Reboot server and restore backups.
> > > > > >> 4- Start mons, check was ok.
> > > > > >> 5- Start mgrs, check was ok.
> > > > > >> 6- Check versions; https://paste.ubuntu.com/p/bxqF9wgDMn/
> > > > > >> 7- Start osds, All the osd's stuck at "booting":
> > > > > >> https://paste.ubuntu.com/p/NY6SP2MBmd/
> > > > > >> 8- I did not start MDS.
> > > > > >>
> > > > > >> Above procedure was tested on my test servers. I tried to upgrade 3
> > > > > >> test server with this order. And when I start OSD's, they started
> > > > > >> pretty fast without problems. My cluster health was OK. However in my
> > > > > >> PROD cluster upgrade OSD does start but they stuck at booting status.
> > > > > >> The only difference of PROD is the network and the count of OSDs.
> > > > > >>
> > > > > >> I need a debug method for OSD's. Because OSD's do not give any clue
> > > > > >> what should I do!
> > > > > >> As you can see my mons & mgr, are properly working. But OSD's are not.
> > > > > >> I think this because they can't talk to MON's somehow.
> > > > > >> I tried to marking all the OSD's "down" + restart all OSD's but
> > > > > >> nothing's changed. I checked network communication between osd's and
> > > > > >> mon's and it seems fine.  I'm using 10G LACP with jumbo frame for
> > > > > >> cluster network and 10G LACP for public network. And it was working
> > > > > >> very well before the upgrade.
> > > > > >>
> > > > > >> I checked everything what I know. My last choice is to downgrade and I
> > > > > >> don't know if it solves my problem or not.
> > > > > >> My hours limited. I have large amounts of data within data pool. It
> > > > > >> needs to be ready on Monday.
> > > > > >>
> > > > > >> Please help me if you can.
> > > > > >>
> > > > > >> Best Regards.
> > > > > >> morph in <morphinwithyou@xxxxxxxxx>, 23 Eyl 2018 Paz, 01:43 tarihinde
> > > > > >> şunu yazdı:
> > > > > >> >
> > > > > >> > Hello. I upgraded my system luminous to mimic
> > > > > >> > I have 168 osd in my system. Im using raid1 nvme for journals. And my pool was healty before upgrade.
> > > > > >> > I'dont upgrade my system with any update tools like apt, pacman.. I'm using images so my all OS are the same and the upgrade was in maintenance mod. Cluster was closed. I tested this upgrade 3 times on test cluster system with 2 server with 12 osd.
> > > > > >> > After upgrade on my prod cluster I see the OSD's are still at booting stage.
> > > > > >> > And It was too fast before mimic when I reboot my cluster.
> > > > > >> > I followed step-by-step mimic upgrade wiki.
> > > > > >> > ceph -s : https://paste.ubuntu.com/p/p2spVmqvJZ/
> > > > > >> > an osd log: https://paste.ubuntu.com/p/PBG66qdHXc/
> > > > > >> > ceph daemon status https://paste.ubuntu.com/p/y7cVspr9cN/
> > > > > >> > 1- Why the hell the "ceph -s" shows like that if the osd's booting. Its so stupid and scary. And I didn't even start any mds.
> > > > > >> > 2- Why the booting takes too long? Is it because mimic upgrade or something else?
> > > > > >> > 3- Waiting for the osd boots will be solve my problem or should I do something?
> > > > > >> >
> > > > > >> > -----------------------------
> > > > > >> > ceph mon feature ls
> > > > > >> > all features
> > > > > >> > supported: [kraken,luminous,mimic,osdmap-prune]
> > > > > >> > persistent: [kraken,luminous,mimic,osdmap-prune]
> > > > > >> > on current monmap (epoch 10)
> > > > > >> > persistent: [kraken,luminous,mimic,osdmap-prune]
> > > > > >> > required: [kraken,luminous,mimic,osdmap-prune]
> > > > > >> >
> > > > > >> > ------------------------
> > > > > >> > ceph osd versions
> > > > > >> > {
> > > > > >> >     "ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)": 50
> > > > > >> > }
> > > > > >> >
> > > > > >> > After all Im leaving my cluster in this State. 8 hour later I will be back. I need a running system at monday morning.
> > > > > >> > Help me please.
> > > > _______________________________________________
> > > > Ceph-community mailing list
> > > > Ceph-community@xxxxxxxxxxxxxx
> > > > http://lists.ceph.com/listinfo.cgi/ceph-community-ceph.com
> > > >
> >
> >




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux