Re: After Mimic upgrade OSD's stuck at booting.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello again.

I'm sending 2nd mail because my problem is very urgent. I'd be very
grateful if somebody helps.

After Luminous to Mimic upgrade when I try to start an OSD. Its
stucking at "booting". (I edit the hostnames so do not care if they're
not identical.)

OSD log: https://paste.ubuntu.com/p/hFhc2dkSqb/
MON log: https://paste.ubuntu.com/p/F85mYwvP4C/
MGR log: https://paste.ubuntu.com/p/jYQ5kJstnH/
CEPH.conf https://paste.ubuntu.com/p/qDwjzdsmGK/
Telnet OSD to MON: https://paste.ubuntu.com/p/fbn9hTWv8q/

I upgraded the system with this order:

1- Stop MDS ->OSD's -> MGR -> MON -> Servers
2- Upgrade OS image 4.14.30-1-lts to --> 4.14.70-1-lts "Ceph,kernel etc"
3- Reboot server and restore backups.
4- Start mons, check was ok.
5- Start mgrs, check was ok.
6- Check versions; https://paste.ubuntu.com/p/bxqF9wgDMn/
7- Start osds, All the osd's stuck at "booting":
https://paste.ubuntu.com/p/NY6SP2MBmd/
8- I did not start MDS.

Above procedure was tested on my test servers. I tried to upgrade 3
test server with this order. And when I start OSD's, they started
pretty fast without problems. My cluster health was OK. However in my
PROD cluster upgrade OSD does start but they stuck at booting status.
The only difference of PROD is the network and the count of OSDs.

I need a debug method for OSD's. Because OSD's do not give any clue
what should I do!
As you can see my mons & mgr, are properly working. But OSD's are not.
I think this because they can't talk to MON's somehow.
I tried to marking all the OSD's "down" + restart all OSD's but
nothing's changed. I checked network communication between osd's and
mon's and it seems fine.  I'm using 10G LACP with jumbo frame for
cluster network and 10G LACP for public network. And it was working
very well before the upgrade.

I checked everything what I know. My last choice is to downgrade and I
don't know if it solves my problem or not.
My hours limited. I have large amounts of data within data pool. It
needs to be ready on Monday.

Please help me if you can.

Best Regards.
morph in <morphinwithyou@xxxxxxxxx>, 23 Eyl 2018 Paz, 01:43 tarihinde
şunu yazdı:
>
> Hello. I upgraded my system luminous to mimic
> I have 168 osd in my system. Im using raid1 nvme for journals. And my pool was healty before upgrade.
> I'dont upgrade my system with any update tools like apt, pacman.. I'm using images so my all OS are the same and the upgrade was in maintenance mod. Cluster was closed. I tested this upgrade 3 times on test cluster system with 2 server with 12 osd.
> After upgrade on my prod cluster I see the OSD's are still at booting stage.
> And It was too fast before mimic when I reboot my cluster.
> I followed step-by-step mimic upgrade wiki.
> ceph -s : https://paste.ubuntu.com/p/p2spVmqvJZ/
> an osd log: https://paste.ubuntu.com/p/PBG66qdHXc/
> ceph daemon status https://paste.ubuntu.com/p/y7cVspr9cN/
> 1- Why the hell the "ceph -s" shows like that if the osd's booting. Its so stupid and scary. And I didn't even start any mds.
> 2- Why the booting takes too long? Is it because mimic upgrade or something else?
> 3- Waiting for the osd boots will be solve my problem or should I do something?
>
> -----------------------------
> ceph mon feature ls
> all features
> supported: [kraken,luminous,mimic,osdmap-prune]
> persistent: [kraken,luminous,mimic,osdmap-prune]
> on current monmap (epoch 10)
> persistent: [kraken,luminous,mimic,osdmap-prune]
> required: [kraken,luminous,mimic,osdmap-prune]
>
> ------------------------
> ceph osd versions
> {
>     "ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)": 50
> }
>
> After all Im leaving my cluster in this State. 8 hour later I will be back. I need a running system at monday morning.
> Help me please.




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux