Re: cephadm Pacific bootstrap hangs waiting for mon

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 31 Aug 2021 at 03:24, Arnaud MARTEL
<arnaud.martel@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi Matthew,
>
> I dont' know if it will be helpful but I had the same problem using debian 10 and the solution was to install docker from docker.io and not from the debian package (too old).
>

Ah, that makes sense.  Thanks!

> Arnaud
>
> ----- Mail original -----
> De: "Matthew Pounsett" <matt@xxxxxxxxxxxxx>
> À: "ceph-users" <ceph-users@xxxxxxx>
> Envoyé: Lundi 30 Août 2021 17:34:32
> Objet:  cephadm Pacific bootstrap hangs waiting for mon
>
> I'm just getting started with Pacific, and I've run into this problem
> trying to get bootstrapped.  cephadm is waiting for the mon to start,
> and waiting, and waiting ...   checking docker ps it looks like it's
> running, but I guess it's never finishing its startup tasks?   I
> waited about 30 minutes the first time.  Killed cephadm and restarted,
> and I seem to have the same problem; I let it run overnight and got
> some additional output that doesn't actually help me much.  Details
> pasted below.
>
> What additional things should I be doing to try to troubleshoot this?
>
> In case it's useful reference info, the mon IP I've given is on our
> "admin" VLAN which is reachable from all hosts on our network.  The
> cluster network subnet I supplied is the 10G VLAN reachable only by
> the servers in the ceph cluster I'm building.  The IP supplied is
> reachable on the local host.
>
> % sudo cephadm bootstrap --allow-fqdn-hostname --mon-ip 192.168.1.192
> --cluster-network 192.168.0.0/24
> Verifying podman|docker is present...
> Verifying lvm2 is present...
> Verifying time synchronization is in place...
> Unit systemd-timesyncd.service is enabled and running
> Repeating the final host check...
> podman|docker (/usr/bin/docker) is present
> systemctl is present
> lvcreate is present
> Unit systemd-timesyncd.service is enabled and running
> Host looks OK
> Cluster fsid: fb45c7b2-0911-11ec-9731-bc97e15d6534
> Verifying IP 192.168.1.192 port 3300 ...
> Verifying IP 192.168.1.192 port 6789 ...
> Mon IP `192.168.1.192` is in CIDR network `192.168.1.0/24`
> Pulling container image docker.io/ceph/ceph:v16...
> Ceph version: ceph version 16.2.5
> (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable)
> Extracting ceph user uid/gid from container image...
> Creating initial keys...
> Creating initial monmap...
> Creating mon...
> Waiting for mon to start...
> Waiting for mon...
> Non-zero exit code 1 from /usr/bin/docker run --rm --ipc=host
> --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph --init -e
> CONTAINER_IMAGE=docker.io/ceph/ceph:v16 -e
> NODE_NAME=cmgmt01.example.net -e CEPH_USE_RANDOM_NONCE=1 -v
> /var/lib/ceph/fb45c7b2-0911-11ec-9731-bc97e15d6534/mon.cmgmt01.example.net:/var/lib/ceph/mon/ceph-cmgmt01.example.net:z
> -v /tmp/ceph-tmp8q3oxeg3:/etc/ceph/ceph.client.admin.keyring:z -v
> /tmp/ceph-tmp4_69yc31:/etc/ceph/ceph.conf:z docker.io/ceph/ceph:v16
> status
> /usr/bin/ceph: stderr 2021-08-29T21:47:23.263+0000 7f2aeaa37700  0
> monclient(hunting): authenticate timed out after 300
> /usr/bin/ceph: stderr 2021-08-29T21:52:23.262+0000 7f2aeaa37700  0
> monclient(hunting): authenticate timed out after 300
> /usr/bin/ceph: stderr 2021-08-29T21:57:23.266+0000 7f2aeaa37700  0
> monclient(hunting): authenticate timed out after 300
> /usr/bin/ceph: stderr 2021-08-29T22:02:23.265+0000 7f2aeaa37700  0
> monclient(hunting): authenticate timed out after 300
> /usr/bin/ceph: stderr 2021-08-29T22:07:23.268+0000 7f2aeaa37700  0
> monclient(hunting): authenticate timed out after 300
> /usr/bin/ceph: stderr 2021-08-29T22:12:23.268+0000 7f2aeaa37700  0
> monclient(hunting): authenticate timed out after 300
> /usr/bin/ceph: stderr 2021-08-29T22:17:23.271+0000 7f2aeaa37700  0
> monclient(hunting): authenticate timed out after 300
> /usr/bin/ceph: stderr 2021-08-29T22:22:23.266+0000 7f2aeaa37700  0
> monclient(hunting): authenticate timed out after 300
> /usr/bin/ceph: stderr 2021-08-29T22:27:23.270+0000 7f2aeaa37700  0
> monclient(hunting): authenticate timed out after 300
> /usr/bin/ceph: stderr 2021-08-29T22:32:23.273+0000 7f2aeaa37700  0
> monclient(hunting): authenticate timed out after 300
> /usr/bin/ceph: stderr [errno 110] RADOS timed out (error connecting to
> the cluster)
> mon not available, waiting (1/15)...
> [ repeats ... ]
>
> The log contains identical info.  The only extra I see is a note at
> the end about releasing locks, which I'm sure is expected and of no
> additional help.
>
> 2021-08-30 11:03:02,801 DEBUG Releasing lock 140656683483824 on
> /run/cephadm/fb45c7b2-0911-11ec-9731-bc97e15d6534.lock
> 2021-08-30 11:03:02,801 DEBUG Lock 140656683483824 released on
> /run/cephadm/fb45c7b2-0911-11ec-9731-bc97e15d6534.lock
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux