Re: Can't start bluestore OSDs after sucessfully moving them 12.1.1 ** ERROR: osd init failed: (2) No such file or directory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Update to this -- I tried building a new host and a new OSD, new disk, and I am having the same issue.



I set osd debug level to 10 -- the issue looks like it's coming from a mon daemon. Still trying to learn enough about the internals of ceph to understand what's happening here.

Relevant debug logs(I think)


2017-07-25 14:21:58.889016 7f25a88af700  1 -- 10.0.15.142:6800/16150 <== mon.1 10.0.15.51:6789/0 1 ==== mon_map magic: 0 v1 ==== 541+0+0 (2831459213 0 0) 0x556640ecd900 con 0x556641949800
2017-07-25 14:21:58.889109 7f25a88af700  1 -- 10.0.15.142:6800/16150 <== mon.1 10.0.15.51:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (248727397 0 0) 0x556640ecdb80 con 0x556641949800
2017-07-25 14:21:58.889204 7f25a88af700  1 -- 10.0.15.142:6800/16150 --> 10.0.15.51:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x556640ecd400 con 0
2017-07-25 14:21:58.889966 7f25a88af700  1 -- 10.0.15.142:6800/16150 <== mon.1 10.0.15.51:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 206+0+0 (3141870879 0 0) 0x556640ecd400 con 0x556641949800
2017-07-25 14:21:58.890066 7f25a88af700  1 -- 10.0.15.142:6800/16150 --> 10.0.15.51:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- 0x556640ecdb80 con 0
2017-07-25 14:21:58.890759 7f25a88af700  1 -- 10.0.15.142:6800/16150 <== mon.1 10.0.15.51:6789/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 564+0+0 (1715764650 0 0) 0x556640ecdb80 con 0x556641949800
2017-07-25 14:21:58.890871 7f25a88af700  1 -- 10.0.15.142:6800/16150 --> 10.0.15.51:6789/0 -- mon_subscribe({monmap=0+}) v2 -- 0x556640e77680 con 0
2017-07-25 14:21:58.890901 7f25a88af700  1 -- 10.0.15.142:6800/16150 --> 10.0.15.51:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- 0x556640ecd400 con 0
2017-07-25 14:21:58.891494 7f25a88af700  1 -- 10.0.15.142:6800/16150 <== mon.1 10.0.15.51:6789/0 5 ==== mon_map magic: 0 v1 ==== 541+0+0 (2831459213 0 0) 0x556640ecde00 con 0x556641949800
2017-07-25 14:21:58.891555 7f25a88af700  1 -- 10.0.15.142:6800/16150 <== mon.1 10.0.15.51:6789/0 6 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 194+0+0 (1036670921 0 0) 0x556640ece080 con 0x556641949800
2017-07-25 14:21:58.892003 7f25b5e71c80 10 osd.7 0 mon_cmd_maybe_osd_create cmd: {"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["7"]}
2017-07-25 14:21:58.892039 7f25b5e71c80  1 -- 10.0.15.142:6800/16150 --> 10.0.15.51:6789/0 -- mon_command({"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["7"]} v 0) v1 -- 0x556640e78d00 con 0
2017-07-25 14:21:58.894596 7f25a88af700  1 -- 10.0.15.142:6800/16150 <== mon.1 10.0.15.51:6789/0 7 ==== mon_command_ack([{"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["7"]}]=-2 (2) No such file or directory v10406) v1 ==== 133+0+0 (3400959855 0 0) 0x556640ece300 con 0x556641949800
2017-07-25 14:21:58.894797 7f25b5e71c80  1 -- 10.0.15.142:6800/16150 --> 10.0.15.51:6789/0 -- mon_command({"prefix": "osd create", "id": 7, "uuid": "92445e4f-850e-453b-b5ab-569d1414f72d"} v 0) v1 -- 0x556640e79180 con 0
2017-07-25 14:21:58.896301 7f25a88af700  1 -- 10.0.15.142:6800/16150 <== mon.1 10.0.15.51:6789/0 8 ==== mon_command_ack([{"prefix": "osd create", "id": 7, "uuid": "92445e4f-850e-453b-b5ab-569d1414f72d"}]=0  v10406) v1 ==== 115+0+2 (2540205126 0 1371665406) 0x556640ece580 con 0x556641949800
2017-07-25 14:21:58.896473 7f25b5e71c80 10 osd.7 0 mon_cmd_maybe_osd_create cmd: {"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["7"]}
2017-07-25 14:21:58.896516 7f25b5e71c80  1 -- 10.0.15.142:6800/16150 --> 10.0.15.51:6789/0 -- mon_command({"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["7"]} v 0) v1 -- 0x556640e793c0 con 0
2017-07-25 14:21:58.898180 7f25a88af700  1 -- 10.0.15.142:6800/16150 <== mon.1 10.0.15.51:6789/0 9 ==== mon_command_ack([{"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["7"]}]=-2 (2) No such file or directory v10406) v1 ==== 133+0+0 (3400959855 0 0) 0x556640ecd900 con 0x556641949800
2017-07-25 14:21:58.898276 7f25b5e71c80 -1 osd.7 0 mon_cmd_maybe_osd_create fail: '(2) No such file or directory': (2) No such file or directory
2017-07-25 14:21:58.898380 7f25b5e71c80  1 -- 10.0.15.142:6800/16150 >> 10.0.15.51:6789/0 conn(0x556641949800 :-1 s=STATE_OPEN pgs=367879 cs=1 l=1).mark_down




On Mon, Jul 24, 2017 at 1:33 PM, Daniel K <sathackr@xxxxxxxxx> wrote:
List -- 

I have a 4-node cluster running on baremetal and have a need to use the kernel client on 2 nodes. As I read you should not run the kernel client on a node that runs an OSD daemon, I decided to move the OSD daemons into a VM on the same device. 

Orignal host is stor-vm2(bare metal), new host is stor-vm2a(Virtual)

All went well -- I did these steps(for each OSD, 5 total per host)

- setup the VM
- install the OS
- installed ceph(using ceph-deploy)
- set noout
- stopped ceph osd on bare metal host 
- unmount /dev/sdb1 from /var/lib/ceph/osd/ceph-0
- add /dev/sdb to the VM
- ceph detected the osd and started automatically.
- moved VM host to the same bucket as physical host in crushmap

I did this for each OSD, and despite some recovery IO because of the updated crushmap, all OSDs were up.

I rebooted the physical host, which rebooted the VM, and now the OSDs are refusing to start.

I've tried moving them back to the bare metal host with the same results.

Any ideas?

Here are what seem to be the relevant osd log lines:

2017-07-24 13:21:53.561265 7faf1752fc80  0 osd.10 8854 crush map has features 2200130813952, adjusting msgr requires for clients
2017-07-24 13:21:53.561284 7faf1752fc80  0 osd.10 8854 crush map has features 2200130813952 was 8705, adjusting msgr requires for mons
2017-07-24 13:21:53.561298 7faf1752fc80  0 osd.10 8854 crush map has features 720578140510109696, adjusting msgr requires for osds
2017-07-24 13:21:55.626834 7faf1752fc80  0 osd.10 8854 load_pgs
2017-07-24 13:22:20.970222 7faf1752fc80  0 osd.10 8854 load_pgs opened 536 pgs
2017-07-24 13:22:20.972659 7faf1752fc80  0 osd.10 8854 using weightedpriority op queue with priority op cut off at 64.
2017-07-24 13:22:20.976861 7faf1752fc80 -1 osd.10 8854 log_to_monitors {default=true}
2017-07-24 13:22:20.998233 7faf1752fc80 -1 osd.10 8854 mon_cmd_maybe_osd_create fail: '(2) No such file or directory': (2) No such file or directory
2017-07-24 13:22:20.999165 7faf1752fc80  1 bluestore(/var/lib/ceph/osd/ceph-10) umount
2017-07-24 13:22:21.016146 7faf1752fc80  1 freelist shutdown
2017-07-24 13:22:21.016243 7faf1752fc80  4 rocksdb: [/build/ceph-12.1.1/src/rocksdb/db/db_impl.cc:217] Shutdown: canceling all background work
2017-07-24 13:22:21.020440 7faf1752fc80  4 rocksdb: [/build/ceph-12.1.1/src/rocksdb/db/db_impl.cc:343] Shutdown complete
2017-07-24 13:22:21.274481 7faf1752fc80  1 bluefs umount
2017-07-24 13:22:21.275822 7faf1752fc80  1 bdev(0x558bb1f82d80 /var/lib/ceph/osd/ceph-10/block) close
2017-07-24 13:22:21.485226 7faf1752fc80  1 bdev(0x558bb1f82b40 /var/lib/ceph/osd/ceph-10/block) close
2017-07-24 13:22:21.551009 7faf1752fc80 -1  ** ERROR: osd init failed: (2) No such file or directory
2017-07-24 13:22:21.563567 7faf1752fc80 -1 /build/ceph-12.1.1/src/common/HeartbeatMap.cc: In function 'ceph::HeartbeatMap::~HeartbeatMap()' thread 7faf1752fc80 time 2017-07-24 13:22:21.558275
/build/ceph-12.1.1/src/common/HeartbeatMap.cc: 39: FAILED assert(m_workers.empty())

 ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x558ba6ba6b72]
 2: (()+0xb81cf1) [0x558ba6cc0cf1]
 3: (CephContext::~CephContext()+0x4d9) [0x558ba6ca77b9]
 4: (CephContext::put()+0xe6) [0x558ba6ca7ab6]
 5: (main()+0x563) [0x558ba650df73]
 6: (__libc_start_main()+0xf0) [0x7faf14999830]
 7: (_start()+0x29) [0x558ba6597cf9]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux