Re: Bluestores+LVM via ceph-volume in Luminous?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2018/02/01 11:58 am, Alfredo Deza wrote:
This is the actual command:

/usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
a2ee64a4-b5ba-4ca9-8528-4205f3ad8c99

What that command is trying to do is to tell the monitor about the
newly created OSD. It is easy to replicate this "hanging" problem if
you modify your ceph.conf to point to an invalid IP for
the monitors.



Thank you for confirming that and pointing me in the right direction!

It would appear my network configuration is certainly correct (from my
understanding; "public" network is 172.16.238.0/24, cluster network is
172.16.239.0/24 -- a configuration that works for the other OSDs built with
ceph-ansible/ceph-disk) and I can reach port 6789 on my MON node:

~# ping -c4 172.16.238.11 && ping -c4 172.16.239.11
PING 172.16.238.11 (172.16.238.11) 56(84) bytes of data.
64 bytes from 172.16.238.11: icmp_seq=1 ttl=64 time=0.141 ms
64 bytes from 172.16.238.11: icmp_seq=2 ttl=64 time=0.102 ms
64 bytes from 172.16.238.11: icmp_seq=3 ttl=64 time=0.107 ms
64 bytes from 172.16.238.11: icmp_seq=4 ttl=64 time=0.096 ms

--- 172.16.238.11 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.096/0.111/0.141/0.020 ms
PING 172.16.239.11 (172.16.239.11) 56(84) bytes of data.
64 bytes from 172.16.239.11: icmp_seq=1 ttl=64 time=0.252 ms
64 bytes from 172.16.239.11: icmp_seq=2 ttl=64 time=0.133 ms
64 bytes from 172.16.239.11: icmp_seq=3 ttl=64 time=0.098 ms
64 bytes from 172.16.239.11: icmp_seq=4 ttl=64 time=0.103 ms

--- 172.16.239.11 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2998ms
rtt min/avg/max/mdev = 0.098/0.146/0.252/0.063 ms
~# telnet 172.16.238.11 6789
Trying 172.16.238.11...
Connected to 172.16.238.11.
Escape character is '^]'.
ceph v027???^?^]quit

telnet> quit
Connection closed.


Is there a command you'd recommend I use to try to ensure connectivity to the MON node from this new OSD node to perhaps help troubleshoot this issue
I'm having?

You need to make sure you are correlating your network interactions
with the same values Ceph is configured with. Like in my example
before, it is easy to replicate if
you have an incorrect IP in your ceph.conf

This might be 10.0.0.1 and you are pinging 10.0.1.0 and it works, but
ceph is using the incorrect one :)

I don't have a specific command that might get you closer.

I would go through the mon and osd troubleshooting guides

http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/


Thanks again. I'm able to confirm via tcpdump that the osd node is indeed attempting (and reaching) the mon nodes (which respond), but apparently they aren't producing anything of substance back to the osd node (based on strace, et al.):

13:02:48.823279 IP 172.16.238.21.35578 > 172.16.238.11.6789: Flags [.], ack 282, win 219, options [nop,nop,TS val 19431869 ecr 365886589], length 0 13:02:48.823296 IP 172.16.238.21.46962 > 172.16.238.13.6789: Flags [P.], seq 146:179, ack 282, win 219, options [nop,nop,TS val 19431869 ecr 364906985], length 33 13:02:48.823322 IP 172.16.238.21.35578 > 172.16.238.11.6789: Flags [P.], seq 10:146, ack 282, win 219, options [nop,nop,TS val 19431869 ecr 365886589], length 136 13:02:48.823350 IP 172.16.238.21.35578 > 172.16.238.11.6789: Flags [P.], seq 146:179, ack 282, win 219, options [nop,nop,TS val 19431869 ecr 365886589], length 33 13:02:48.823356 IP 172.16.238.13.6789 > 172.16.238.21.46962: Flags [.], ack 179, win 219, options [nop,nop,TS val 364906985 ecr 19431869], length 0 13:02:48.823380 IP 172.16.238.13.6789 > 172.16.238.21.46962: Flags [P.], seq 282:316, ack 179, win 219, options [nop,nop,TS val 364906985 ecr 19431869], length 34 13:02:48.823400 IP 172.16.238.11.6789 > 172.16.238.21.35578: Flags [.], ack 179, win 219, options [nop,nop,TS val 365886589 ecr 19431869], length 0 13:02:48.823423 IP 172.16.238.21.46962 > 172.16.238.13.6789: Flags [P.], seq 179:187, ack 316, win 219, options [nop,nop,TS val 19431869 ecr 364906985], length 8 13:02:48.823428 IP 172.16.238.11.6789 > 172.16.238.21.35578: Flags [P.], seq 282:316, ack 179, win 219, options [nop,nop,TS val 365886589 ecr 19431869], length 34 13:02:48.823449 IP 172.16.238.21.35578 > 172.16.238.11.6789: Flags [P.], seq 179:187, ack 316, win 219, options [nop,nop,TS val 19431869 ecr 365886589], length 8 13:02:48.823478 IP 172.16.238.21.46962 > 172.16.238.13.6789: Flags [P.], seq 187:343, ack 316, win 219, options [nop,nop,TS val 19431869 ecr 364906985], length 156 13:02:48.823483 IP 172.16.238.21.35578 > 172.16.238.11.6789: Flags [P.], seq 187:343, ack 316, win 219, options [nop,nop,TS val 19431869 ecr 365886589], length 156 13:02:48.823519 IP 172.16.238.13.6789 > 172.16.238.21.46962: Flags [.], ack 343, win 227, options [nop,nop,TS val 364906985 ecr 19431869], length 0 13:02:48.823535 IP 172.16.238.11.6789 > 172.16.238.21.35578: Flags [.], ack 343, win 227, options [nop,nop,TS val 365886589 ecr 19431869], length 0 13:02:48.823569 IP 172.16.238.13.6789 > 172.16.238.21.46962: Flags [P.], seq 316:325, ack 343, win 227, options [nop,nop,TS val 364906985 ecr 19431869], length 9 13:02:48.823612 IP 172.16.238.11.6789 > 172.16.238.21.35578: Flags [P.], seq 316:325, ack 343, win 227, options [nop,nop,TS val 365886589 ecr 19431869], length 9 13:02:48.823711 IP 172.16.238.13.6789 > 172.16.238.21.46962: Flags [P.], seq 325:433, ack 343, win 227, options [nop,nop,TS val 364906985 ecr 19431869], length 108 13:02:48.823736 IP 172.16.238.21.46962 > 172.16.238.13.6789: Flags [.], ack 433, win 219, options [nop,nop,TS val 19431869 ecr 364906985], length 0 13:02:48.823764 IP 172.16.238.11.6789 > 172.16.238.21.35578: Flags [P.], seq 325:433, ack 343, win 227, options [nop,nop,TS val 365886589 ecr 19431869], length 108 13:02:48.823839 IP 172.16.238.21.35578 > 172.16.238.11.6789: Flags [.], ack 433, win 219, options [nop,nop,TS val 19431869 ecr 365886589], length 0 13:02:48.823891 IP 172.16.238.21.46962 > 172.16.238.13.6789: Flags [P.], seq 343:480, ack 433, win 219, options [nop,nop,TS val 19431869 ecr 364906985], length 137 13:02:48.823970 IP 172.16.238.21.35578 > 172.16.238.11.6789: Flags [P.], seq 343:480, ack 433, win 219, options [nop,nop,TS val 19431869 ecr 365886589], length 137 13:02:48.824249 IP 172.16.238.13.6789 > 172.16.238.21.46962: Flags [P.], seq 433:730, ack 480, win 235, options [nop,nop,TS val 364906985 ecr 19431869], length 297 13:02:48.824347 IP 172.16.238.11.6789 > 172.16.238.21.35578: Flags [P.], seq 433:730, ack 480, win 235, options [nop,nop,TS val 365886589 ecr 19431869], length 297 13:02:48.824423 IP 172.16.238.21.46962 > 172.16.238.13.6789: Flags [P.], seq 480:766, ack 730, win 227, options [nop,nop,TS val 19431869 ecr 364906985], length 286 13:02:48.824536 IP 172.16.238.21.35578 > 172.16.238.11.6789: Flags [P.], seq 480:766, ack 730, win 227, options [nop,nop,TS val 19431869 ecr 365886589], length 286 13:02:48.824970 IP 172.16.238.13.6789 > 172.16.238.21.46962: Flags [P.], seq 730:1401, ack 766, win 244, options [nop,nop,TS val 364906985 ecr 19431869], length 671 13:02:48.825004 IP 172.16.238.11.6789 > 172.16.238.21.35578: Flags [P.], seq 730:1401, ack 766, win 244, options [nop,nop,TS val 365886589 ecr 19431869], length 671 13:02:48.825166 IP 172.16.238.21.35578 > 172.16.238.11.6789: Flags [F.], seq 766, ack 1401, win 238, options [nop,nop,TS val 19431869 ecr 365886589], length 0

172.16.238.21 is the OSD, 172.16.238.11 and 172.16.238.12 are MONs. Perhaps then, this is a bug within ceph? I wonder if there are verbose logs on the mon that might show something or perhaps I can trace something there?

In any case, I'm going to go through the troubleshooting guides and see if they're any help here. I otherwise may try ceph-ansible/stable-3.0 with ceph-disk (since this "worked" with the other OSDs, meaning ceph-ansible was able to complete without this "hang") before I just tear it all down and try to build out manually.



--
Andre Goree
-=-=-=-=-=-
Email     - andre at drenet.net
Website   - http://blog.drenet.net
PGP key   - http://www.drenet.net/pubkey.html
-=-=-=-=-=-
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux