Re: OSD:s failing out after upgrade to 9.2.0 on Ubuntu 14.04

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

 

I have started to dig into this again, too much work stopped me from digging further and I had the machines shut down since, anyway. My problem started when I upgraded to Infernalis from Hammer (I am quite certain it was at least 0.94.4) then my OSDs can´t join any more. I am running on Ubuntu 14.04. I guess rolling back is not an option?

 

My setup is 3 HP microservers named black, orange and purple. All 3 servers have one mon each and four OSDs.

 

It has been suggested to download a version released by Sage that might fix my problem, but I am not sure where the archives are.

 

Looking at the mon log I see:

2015-12-05 19:08:49.895681 7f68852a9700 10 mon.black@0(leader).pg v8253363 check_osd_map -- osdmap not readable, waiting

 

Also this might give a clue to what is happening, when I try to change the logging for the OSD I get an error:

ceph tell osd.0 injectargs '--debug-osd 0/5'

Error ENXIO: problem getting command descriptions from osd.0

 

More from the logs, I marked osd.0 as in at Dec  5 19:08:49 CET 2015

 

I hope my data is still ok, I am not in a hurry to get it back and prefer a safe solution to a quick one J

 

I am very thankful for any help or suggestions. Since all this have just worked for way over a year then I am quite rusty when it comes to figuring out what is wrong and my Linux skills are not enough to figure out what has gone wrong.

 

I tried to only keep what it in the logs from when I tried to set the osd.0 to in.

 

In the log for the OSD when I start it and try to get it to join I get:

2015-12-05 19:08:18.319849 7feef41b5940  0 ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 4249

2015-12-05 19:08:18.414650 7feef41b5940  0 filestore(/ceph/osd.0) backend xfs (magic 0x58465342)

2015-12-05 19:08:18.415996 7feef41b5940  0 genericfilestorebackend(/ceph/osd.0) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option

2015-12-05 19:08:18.416033 7feef41b5940  0 genericfilestorebackend(/ceph/osd.0) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option

2015-12-05 19:08:18.416090 7feef41b5940  0 genericfilestorebackend(/ceph/osd.0) detect_features: splice is supported

2015-12-05 19:08:18.418819 7feef41b5940  0 genericfilestorebackend(/ceph/osd.0) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)

2015-12-05 19:08:18.419136 7feef41b5940  0 xfsfilestorebackend(/ceph/osd.0) detect_features: extsize is supported and your kernel >= 3.5

2015-12-05 19:08:18.527107 7feef41b5940  0 filestore(/ceph/osd.0) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled

2015-12-05 19:08:18.637889 7feef41b5940  1 journal _open /dev/black/journal-osd.0 fd 19: 23622320128 bytes, block size 4096 bytes, directio = 1, aio = 1

2015-12-05 19:08:18.667396 7feef41b5940  1 journal _open /dev/black/journal-osd.0 fd 19: 23622320128 bytes, block size 4096 bytes, directio = 1, aio = 1

2015-12-05 19:08:18.720872 7feef41b5940  1 filestore(/ceph/osd.0) upgrade

2015-12-05 19:08:18.742447 7feef41b5940  0 <cls> cls/cephfs/cls_cephfs.cc:136: loading cephfs_size_scan

2015-12-05 19:08:18.744275 7feef41b5940  0 <cls> cls/hello/cls_hello.cc:305: loading cls_hello

2015-12-05 19:08:18.761847 7feef41b5940  0 osd.0 39530 crush map has features 1107558400, adjusting msgr requires for clients

2015-12-05 19:08:18.761883 7feef41b5940  0 osd.0 39530 crush map has features 1107558400 was 8705, adjusting msgr requires for mons

2015-12-05 19:08:18.761899 7feef41b5940  0 osd.0 39530 crush map has features 1107558400, adjusting msgr requires for osds

2015-12-05 19:08:21.301437 7feef41b5940  0 osd.0 39530 load_pgs

2015-12-05 19:09:04.080813 7feef41b5940  0 osd.0 39530 load_pgs opened 1230 pgs

2015-12-05 19:09:04.118602 7feef41b5940 -1 osd.0 39530 log_to_monitors {default=true}

2015-12-05 19:09:04.135379 7feed5f0e700  0 osd.0 39530 ignoring osdmap until we have initialized

2015-12-05 19:09:05.867889 7feef41b5940  0 osd.0 39530 done with init, starting boot process

 

In the logs of the mons I get:

2015-12-05 19:08:49.635074 7f688262e700 10 mon.black@0(leader).log v9157752  logging 2015-12-05 19:08:49.633694 mon.2 172.16.0.203:6789/0 27 : audit [INF] from='client.? 172.16.0.201:0/3894299556' entity='client.admin' cmd=[{"prefix": "osd in", "ids": ["0"]}]: dispatch

2015-12-05 19:08:49.635184 7f688262e700 20 mon.black@0(leader) e3 _ms_dispatch existing session 0x7f688f218c40 for mon.2 172.16.0.203:6789/0

2015-12-05 19:08:49.635188 7f688262e700 20 mon.black@0(leader) e3  caps allow *

2015-12-05 19:08:49.635196 7f688262e700 20 is_capable service=mon command= read on cap allow *

2015-12-05 19:08:49.635198 7f688262e700 20  allow so far , doing grant allow *

2015-12-05 19:08:49.635199 7f688262e700 20  allow all

2015-12-05 19:08:49.635201 7f688262e700 10 mon.black@0(leader) e3 received forwarded message from client.524843 172.16.0.201:0/3894299556 via mon.2 172.16.0.203:6789/0

2015-12-05 19:08:49.635207 7f688262e700 20 is_capable service=mon command= exec on cap allow *

2015-12-05 19:08:49.635209 7f688262e700 20  allow so far , doing grant allow *

2015-12-05 19:08:49.635210 7f688262e700 20  allow all

2015-12-05 19:08:49.635214 7f688262e700 10 mon.black@0(leader) e3  caps are allow *

2015-12-05 19:08:49.635216 7f688262e700 10 mon.black@0(leader) e3  entity name 'client.admin' type 8

2015-12-05 19:08:49.635218 7f688262e700 10 mon.black@0(leader) e3  mesg 0x7f688fb10800 from 172.16.0.203:6789/0

2015-12-05 19:08:49.635240 7f688262e700 20 mon.black@0(leader) e3 _ms_dispatch existing session 0x7f688f21b480 for client.524843 :/0

2015-12-05 19:08:49.635243 7f688262e700 20 mon.black@0(leader) e3  caps allow *

2015-12-05 19:08:49.635301 7f688262e700  0 mon.black@0(leader) e3 handle_command mon_command({"prefix": "osd in", "ids": ["0"]} v 0) v1

2015-12-05 19:08:49.635337 7f688262e700 20 is_capable service=osd command=osd in read write on cap allow *

2015-12-05 19:08:49.635340 7f688262e700 20  allow so far , doing grant allow *

2015-12-05 19:08:49.635341 7f688262e700 20  allow all

2015-12-05 19:08:49.635342 7f688262e700 10 mon.black@0(leader) e3 _allowed_command capable

2015-12-05 19:08:49.635353 7f688262e700  0 log_channel(audit) log [INF] : from='client.524843 :/0' entity='client.admin' cmd=[{"prefix": "osd in", "ids": ["0"]}]: dispatch

2015-12-05 19:08:49.635453 7f688262e700 10 mon.black@0(leader).osd e39530 preprocess_query mon_command({"prefix": "osd in", "ids": ["0"]} v 0) v1 from client.524843 172.16.0.201:0/3894299556

2015-12-05 19:08:49.635502 7f688262e700  7 mon.black@0(leader).osd e39530 prepare_update mon_command({"prefix": "osd in", "ids": ["0"]} v 0) v1 from client.524843 172.16.0.201:0/3894299556

2015-12-05 19:08:49.637872 7f688262e700 10 mon.black@0(leader).osd e39530 should_propose

2015-12-05 19:08:49.637997 7f688262e700 20 mon.black@0(leader) e3 _ms_dispatch existing session 0x7f688f218a80 for mon.0 172.16.0.201:6789/0

2015-12-05 19:08:49.638004 7f688262e700 20 mon.black@0(leader) e3  caps allow *

2015-12-05 19:08:49.638034 7f688262e700 10 mon.black@0(leader).log v9157752 preprocess_query log(1 entries) v1 from mon.0 172.16.0.201:6789/0

2015-12-05 19:08:49.638060 7f688262e700 10 mon.black@0(leader).log v9157752 preprocess_log log(1 entries) v1 from mon.0

2015-12-05 19:08:49.638064 7f688262e700 20 is_capable service=log command= write on cap allow *

2015-12-05 19:08:49.638067 7f688262e700 20  allow so far , doing grant allow *

2015-12-05 19:08:49.638069 7f688262e700 20  allow all

2015-12-05 19:08:49.638089 7f688262e700 10 mon.black@0(leader).log v9157752 prepare_update log(1 entries) v1 from mon.0 172.16.0.201:6789/0

2015-12-05 19:08:49.638100 7f688262e700 10 mon.black@0(leader).log v9157752 prepare_log log(1 entries) v1 from mon.0

2015-12-05 19:08:49.638103 7f688262e700 10 mon.black@0(leader).log v9157752  logging 2015-12-05 19:08:49.635355 mon.0 172.16.0.201:6789/0 29 : audit [INF] from='client.524843 :/0' entity='client.admin' cmd=[{"prefix": "osd in", "ids": ["0"]}]: dispatch

2015-12-05 19:08:49.685271 7f6882e2f700 10 mon.black@0(leader).log v9157752 encode_full log v 9157752

2015-12-05 19:08:49.685505 7f6882e2f700 10 mon.black@0(leader).log v9157752 encode_pending v9157753

2015-12-05 19:08:49.690050 7f6882e2f700 10 mon.black@0(leader).osd e39530 encode_pending e 39531

2015-12-05 19:08:49.690092 7f6882e2f700  2 mon.black@0(leader).osd e39530  osd.0 IN

2015-12-05 19:08:49.690698 7f6882e2f700 20 mon.black@0(leader).osd e39530  full_crc 3915856006 inc_crc 3792224963

2015-12-05 19:08:49.691048 7f688262e700 20 mon.black@0(leader) e3 _ms_dispatch existing session 0x7f688f2188c0 for mon.1 172.16.0.202:6789/0

2015-12-05 19:08:49.691055 7f688262e700 20 mon.black@0(leader) e3  caps allow *

2015-12-05 19:08:49.691064 7f688262e700 20 is_capable service=mon command= read on cap allow *

2015-12-05 19:08:49.691068 7f688262e700 20  allow so far , doing grant allow *

2015-12-05 19:08:49.691069 7f688262e700 20  allow all

2015-12-05 19:08:49.691071 7f688262e700 20 is_capable service=mon command= exec on cap allow *

2015-12-05 19:08:49.691073 7f688262e700 20  allow so far , doing grant allow *

2015-12-05 19:08:49.691074 7f688262e700 20  allow all

2015-12-05 19:08:49.878891 7f688262e700 20 mon.black@0(leader) e3 _ms_dispatch existing session 0x7f688f218c40 for mon.2 172.16.0.203:6789/0

2015-12-05 19:08:49.878904 7f688262e700 20 mon.black@0(leader) e3  caps allow *

2015-12-05 19:08:49.878921 7f688262e700 20 is_capable service=mon command= read on cap allow *

2015-12-05 19:08:49.878926 7f688262e700 20  allow so far , doing grant allow *

2015-12-05 19:08:49.878929 7f688262e700 20  allow all

2015-12-05 19:08:49.878932 7f688262e700 20 is_capable service=mon command= exec on cap allow *

2015-12-05 19:08:49.878935 7f688262e700 20  allow so far , doing grant allow *

2015-12-05 19:08:49.878937 7f688262e700 20  allow all

2015-12-05 19:08:49.881064 7f68852a9700 10 mon.black@0(leader) e3 refresh_from_paxos

2015-12-05 19:08:49.881688 7f68852a9700 10 mon.black@0(leader).log v9157753 update_from_paxos

2015-12-05 19:08:49.881696 7f68852a9700 10 mon.black@0(leader).log v9157753 update_from_paxos version 9157753 summary v 9157752

2015-12-05 19:08:49.881747 7f68852a9700 10 mon.black@0(leader).log v9157753 update_from_paxos latest full 9157752

2015-12-05 19:08:49.881809 7f68852a9700  7 mon.black@0(leader).log v9157753 update_from_paxos applying incremental log 9157753 2015-12-05 19:08:49.633694 mon.2 172.16.0.203:6789/0 27 : audit [INF] from='client.? 172.16.0.201:0/3894299556' entity='client.admin' cmd=[{"prefix": "osd in", "ids": ["0"]}]: dispatch

2015-12-05 19:08:49.881840 7f68852a9700 20 mon.black@0(leader).log v9157753 update_from_paxos logging for channel 'audit' to file '/var/log/ceph/ceph.audit.log'

2015-12-05 19:08:49.881882 7f68852a9700  7 mon.black@0(leader).log v9157753 update_from_paxos applying incremental log 9157753 2015-12-05 19:08:49.635355 mon.0 172.16.0.201:6789/0 29 : audit [INF] from='client.524843 :/0' entity='client.admin' cmd=[{"prefix": "osd in", "ids": ["0"]}]: dispatch

2015-12-05 19:08:49.881901 7f68852a9700 20 mon.black@0(leader).log v9157753 update_from_paxos logging for channel 'audit' to file '/var/log/ceph/ceph.audit.log'

2015-12-05 19:08:49.881925 7f68852a9700 15 mon.black@0(leader).log v9157753 update_from_paxos logging for 1 channels

2015-12-05 19:08:49.881929 7f68852a9700 15 mon.black@0(leader).log v9157753 update_from_paxos channel 'audit' logging 353 bytes

2015-12-05 19:08:49.882845 7f68852a9700 10 mon.black@0(leader).log v9157753 check_subs

2015-12-05 19:08:49.883231 7f68852a9700 10 mon.black@0(leader).auth v14845 update_from_paxos

2015-12-05 19:08:49.883242 7f68852a9700 10 mon.black@0(leader).pg v8253363 map_pg_creates to 0 pgs -- no change

2015-12-05 19:08:49.883247 7f68852a9700 10 mon.black@0(leader).pg v8253363 send_pg_creates to 0 pgs

2015-12-05 19:08:49.883400 7f68852a9700 10 mon.black@0(leader).log v9157753 create_pending v 9157754

2015-12-05 19:08:49.883423 7f68852a9700  7 mon.black@0(leader).log v9157753 _updated_log for mon.2 172.16.0.203:6789/0

2015-12-05 19:08:49.883444 7f68852a9700  2 mon.black@0(leader) e3 send_reply 0x7f688f64ca80 0x7f688f3d5680 log(last 27) v1

2015-12-05 19:08:49.883450 7f68852a9700 15 mon.black@0(leader) e3 send_reply routing reply to 172.16.0.203:6789/0 via 172.16.0.203:6789/0 for request log(1 entries) v1

2015-12-05 19:08:49.883531 7f68852a9700  7 mon.black@0(leader).log v9157753 _updated_log for mon.0 172.16.0.201:6789/0

2015-12-05 19:08:49.883548 7f68852a9700  2 mon.black@0(leader) e3 send_reply 0x7f688f64cb60 0x7f688f3d6940 log(last 29) v1

2015-12-05 19:08:49.886246 7f688262e700 20 mon.black@0(leader) e3 _ms_dispatch existing session 0x7f688f218a80 for mon.0 172.16.0.201:6789/0

2015-12-05 19:08:49.886261 7f688262e700 20 mon.black@0(leader) e3  caps allow *

2015-12-05 19:08:49.886323 7f688262e700 20 is_capable service=mon command= read on cap allow *

2015-12-05 19:08:49.886330 7f688262e700 20  allow so far , doing grant allow *

2015-12-05 19:08:49.886332 7f688262e700 20  allow all

2015-12-05 19:08:49.888142 7f688262e700 20 mon.black@0(leader) e3 _ms_dispatch existing session 0x7f688f2188c0 for mon.1 172.16.0.202:6789/0

2015-12-05 19:08:49.888346 7f688262e700 20 mon.black@0(leader) e3  caps allow *

2015-12-05 19:08:49.888367 7f688262e700 20 is_capable service=mon command= read on cap allow *

2015-12-05 19:08:49.888373 7f688262e700 20  allow so far , doing grant allow *

2015-12-05 19:08:49.888375 7f688262e700 20  allow all

2015-12-05 19:08:49.888379 7f688262e700 20 is_capable service=mon command= exec on cap allow *

2015-12-05 19:08:49.888382 7f688262e700 20  allow so far , doing grant allow *

2015-12-05 19:08:49.888384 7f688262e700 20  allow all

2015-12-05 19:08:49.888502 7f688262e700 20 mon.black@0(leader) e3 _ms_dispatch existing session 0x7f688f218c40 for mon.2 172.16.0.203:6789/0

2015-12-05 19:08:49.888512 7f688262e700 20 mon.black@0(leader) e3  caps allow *

2015-12-05 19:08:49.888524 7f688262e700 20 is_capable service=mon command= read on cap allow *

2015-12-05 19:08:49.888528 7f688262e700 20  allow so far , doing grant allow *

2015-12-05 19:08:49.888530 7f688262e700 20  allow all

2015-12-05 19:08:49.888532 7f688262e700 20 is_capable service=mon command= exec on cap allow *

2015-12-05 19:08:49.888535 7f688262e700 20  allow so far , doing grant allow *

2015-12-05 19:08:49.888537 7f688262e700 20  allow all

2015-12-05 19:08:49.889676 7f688262e700 20 mon.black@0(leader) e3 _ms_dispatch existing session 0x7f688f2188c0 for mon.1 172.16.0.202:6789/0

2015-12-05 19:08:49.889688 7f688262e700 20 mon.black@0(leader) e3  caps allow *

2015-12-05 19:08:49.889703 7f688262e700 20 is_capable service=mon command= read on cap allow *

2015-12-05 19:08:49.889709 7f688262e700 20  allow so far , doing grant allow *

2015-12-05 19:08:49.889711 7f688262e700 20  allow all

2015-12-05 19:08:49.889714 7f688262e700 20 is_capable service=mon command= exec on cap allow *

2015-12-05 19:08:49.889717 7f688262e700 20  allow so far , doing grant allow *

2015-12-05 19:08:49.889719 7f688262e700 20  allow all

2015-12-05 19:08:49.890685 7f688262e700 20 mon.black@0(leader) e3 _ms_dispatch existing session 0x7f688f218c40 for mon.2 172.16.0.203:6789/0

2015-12-05 19:08:49.890697 7f688262e700 20 mon.black@0(leader) e3  caps allow *

2015-12-05 19:08:49.890712 7f688262e700 20 is_capable service=mon command= read on cap allow *

2015-12-05 19:08:49.890717 7f688262e700 20  allow so far , doing grant allow *

2015-12-05 19:08:49.890720 7f688262e700 20  allow all

2015-12-05 19:08:49.890747 7f688262e700 20 is_capable service=mon command= exec on cap allow *

2015-12-05 19:08:49.890751 7f688262e700 20  allow so far , doing grant allow *

2015-12-05 19:08:49.890753 7f688262e700 20  allow all

2015-12-05 19:08:49.894322 7f68852a9700 10 mon.black@0(leader) e3 refresh_from_paxos

2015-12-05 19:08:49.894834 7f68852a9700 15 mon.black@0(leader).osd e39530 update_from_paxos paxos e 39531, my e 39530

2015-12-05 19:08:49.894899 7f68852a9700  7 mon.black@0(leader).osd e39530 update_from_paxos  applying incremental 39531

2015-12-05 19:08:49.895218 7f68852a9700  1 mon.black@0(leader).osd e39531 e39531: 12 osds: 3 up, 4 in

2015-12-05 19:08:49.895663 7f68852a9700 10 mon.black@0(leader).osd e39531  adding osd.0 to down_pending_out map

2015-12-05 19:08:49.895681 7f68852a9700 10 mon.black@0(leader).pg v8253363 check_osd_map -- osdmap not readable, waiting

 

 

 

 

 

 

 

From: Claes Sahlström
Sent: den 16 november 2015 22:43
To: ceph-users@xxxxxxxxxxxxxx
Subject: RE: OSD:s failing out after upgrade to 9.2.0 on Ubuntu 14.04

 

Yes I upgraded from Hammer 0.94.4.

 

And "ceph-osd --version" gives the correct version 9.2.0,  I think it is a problem with the communication between my OSDs and either the MONs or the other OSDs or maybe both.

 

I will check out those archives from Sage also…

 

I have probably done something wrong, but I cannot figure out what is is. All my upgrades before was smooth and this is quite an old installation I have at home.

 

 

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Josef Johansson
Sent: den 16 november 2015 22:18
To: David Clarke <davidc@xxxxxxxxxxxxxxx>
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re: OSD:s failing out after upgrade to 9.2.0 on Ubuntu 14.04

 

And if you look through the archives Sage did release a version of Infernalis that fixed if you didn’t do it that way as well.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux