Re: cephadm Pacific bootstrap hangs waiting for mon

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Matthew,

I dont' know if it will be helpful but I had the same problem using debian 10 and the solution was to install docker from docker.io and not from the debian package (too old). 

Arnaud

----- Mail original -----
De: "Matthew Pounsett" <matt@xxxxxxxxxxxxx>
À: "ceph-users" <ceph-users@xxxxxxx>
Envoyé: Lundi 30 Août 2021 17:34:32
Objet:  cephadm Pacific bootstrap hangs waiting for mon

I'm just getting started with Pacific, and I've run into this problem
trying to get bootstrapped.  cephadm is waiting for the mon to start,
and waiting, and waiting ...   checking docker ps it looks like it's
running, but I guess it's never finishing its startup tasks?   I
waited about 30 minutes the first time.  Killed cephadm and restarted,
and I seem to have the same problem; I let it run overnight and got
some additional output that doesn't actually help me much.  Details
pasted below.

What additional things should I be doing to try to troubleshoot this?

In case it's useful reference info, the mon IP I've given is on our
"admin" VLAN which is reachable from all hosts on our network.  The
cluster network subnet I supplied is the 10G VLAN reachable only by
the servers in the ceph cluster I'm building.  The IP supplied is
reachable on the local host.

% sudo cephadm bootstrap --allow-fqdn-hostname --mon-ip 192.168.1.192
--cluster-network 192.168.0.0/24
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit systemd-timesyncd.service is enabled and running
Repeating the final host check...
podman|docker (/usr/bin/docker) is present
systemctl is present
lvcreate is present
Unit systemd-timesyncd.service is enabled and running
Host looks OK
Cluster fsid: fb45c7b2-0911-11ec-9731-bc97e15d6534
Verifying IP 192.168.1.192 port 3300 ...
Verifying IP 192.168.1.192 port 6789 ...
Mon IP `192.168.1.192` is in CIDR network `192.168.1.0/24`
Pulling container image docker.io/ceph/ceph:v16...
Ceph version: ceph version 16.2.5
(0883bdea7337b95e4b611c768c0279868462204a) pacific (stable)
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...
Non-zero exit code 1 from /usr/bin/docker run --rm --ipc=host
--stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph --init -e
CONTAINER_IMAGE=docker.io/ceph/ceph:v16 -e
NODE_NAME=cmgmt01.example.net -e CEPH_USE_RANDOM_NONCE=1 -v
/var/lib/ceph/fb45c7b2-0911-11ec-9731-bc97e15d6534/mon.cmgmt01.example.net:/var/lib/ceph/mon/ceph-cmgmt01.example.net:z
-v /tmp/ceph-tmp8q3oxeg3:/etc/ceph/ceph.client.admin.keyring:z -v
/tmp/ceph-tmp4_69yc31:/etc/ceph/ceph.conf:z docker.io/ceph/ceph:v16
status
/usr/bin/ceph: stderr 2021-08-29T21:47:23.263+0000 7f2aeaa37700  0
monclient(hunting): authenticate timed out after 300
/usr/bin/ceph: stderr 2021-08-29T21:52:23.262+0000 7f2aeaa37700  0
monclient(hunting): authenticate timed out after 300
/usr/bin/ceph: stderr 2021-08-29T21:57:23.266+0000 7f2aeaa37700  0
monclient(hunting): authenticate timed out after 300
/usr/bin/ceph: stderr 2021-08-29T22:02:23.265+0000 7f2aeaa37700  0
monclient(hunting): authenticate timed out after 300
/usr/bin/ceph: stderr 2021-08-29T22:07:23.268+0000 7f2aeaa37700  0
monclient(hunting): authenticate timed out after 300
/usr/bin/ceph: stderr 2021-08-29T22:12:23.268+0000 7f2aeaa37700  0
monclient(hunting): authenticate timed out after 300
/usr/bin/ceph: stderr 2021-08-29T22:17:23.271+0000 7f2aeaa37700  0
monclient(hunting): authenticate timed out after 300
/usr/bin/ceph: stderr 2021-08-29T22:22:23.266+0000 7f2aeaa37700  0
monclient(hunting): authenticate timed out after 300
/usr/bin/ceph: stderr 2021-08-29T22:27:23.270+0000 7f2aeaa37700  0
monclient(hunting): authenticate timed out after 300
/usr/bin/ceph: stderr 2021-08-29T22:32:23.273+0000 7f2aeaa37700  0
monclient(hunting): authenticate timed out after 300
/usr/bin/ceph: stderr [errno 110] RADOS timed out (error connecting to
the cluster)
mon not available, waiting (1/15)...
[ repeats ... ]

The log contains identical info.  The only extra I see is a note at
the end about releasing locks, which I'm sure is expected and of no
additional help.

2021-08-30 11:03:02,801 DEBUG Releasing lock 140656683483824 on
/run/cephadm/fb45c7b2-0911-11ec-9731-bc97e15d6534.lock
2021-08-30 11:03:02,801 DEBUG Lock 140656683483824 released on
/run/cephadm/fb45c7b2-0911-11ec-9731-bc97e15d6534.lock
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux