cephadm bootstrap failed with docker

farhad kh <farhad.khedriyan@xxxxxxxxx> · Fri, 7 Mar 2025 21:25:27 +0330

hi , for trying deploy cluster with cephadm version 19.2.1 and using docker
version 28.0.1 i get this error :
-------
# cephadm    --image opkbhfpsksp0101.p.fnst/ceph/ceph:v19.2.1 bootstrap
--mon-ip 10.248.35.143 --registry-json /root/reg.json
 --allow-fqdn-hostname --initial-dashboard-user admin
--initial-dashboard-password P@ssw0rd1404   --dashboard-password-noupdate
 --ssh-user cephadmin
Verifying ssh connectivity using standard pubkey authentication ...
Adding key to cephadmin@localhost authorized_keys...
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
docker (/bin/docker) is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: 15d9eaee-fbe0-11ef-ad63-005056a83619
Verifying IP 10.248.35.143 port 3300 ...
Verifying IP 10.248.35.143 port 6789 ...
Mon IP `10.248.35.143` is in CIDR network `10.248.35.0/24`
Mon IP `10.248.35.143` is in CIDR network `10.248.35.0/24`
Internal network (--cluster-network) has not been provided, OSD replication
will default to the public_network
Logging into custom registry.
Pulling custom registry login info from /root/reg.json.
Pulling container image opkbhfpsksp0101.p.fnst/ceph/ceph:v19.2.1...
Non-zero exit code 125 from /bin/docker run --rm --ipc=host
--stop-signal=SIGTERM --ulimit nofile=1048576 --net=host --entrypoint ceph
--init -e CONTAINER_IMAGE=opkbhfpsksp0101.p.fnst/ceph/ceph:v19.2.1 -e
NODE_NAME=opcpmfpsksa0403 opkbhfpsksp0101.p.fnst/ceph/ceph:v19.2.1 --version
ceph: stderr docker: Error response from daemon: failed to create task for
container: failed to create shim task: OCI runtime create failed: runc
create failed: unable to start container process: can't copy bootstrap data
to pipe: write init-p: broken pipe: unknown
ceph: stderr
ceph: stderr Run 'docker run --help' for more information
RuntimeError: Failed command: /bin/docker run --rm --ipc=host
--stop-signal=SIGTERM --ulimit nofile=1048576 --net=host --entrypoint ceph
--init -e CONTAINER_IMAGE=opkbhfpsksp0101.p.fnst/ceph/ceph:v19.2.1 -e
NODE_NAME=opcpmfpsksa0403 opkbhfpsksp0101.p.fnst/ceph/ceph:v19.2.1 --version

        ***************
        Cephadm hit an issue during cluster installation. Current cluster
files will be deleted automatically.
        To disable this behaviour you can pass the --no-cleanup-on-failure
flag. In case of any previous
        broken installation, users must use the following command to
completely delete the broken cluster:

        > cephadm rm-cluster --force --zap-osds --fsid <fsid>

        for more information please refer to
https://docs.ceph.com/en/latest/cephadm/operations/#purging-a-cluster
        ***************

Deleting cluster with fsid: 15d9eaee-fbe0-11ef-ad63-005056a83619
Traceback (most recent call last):
  File "/tmp/tmpfe1bt8s9.cephadm.build/app/__main__.py", line 2628, in
_rollback
  File "/tmp/tmpfe1bt8s9.cephadm.build/app/__main__.py", line 446, in
_default_image
  File "/tmp/tmpfe1bt8s9.cephadm.build/app/__main__.py", line 2763, in
command_bootstrap
  File "/tmp/tmpfe1bt8s9.cephadm.build/app/cephadmlib/container_types.py",
line 429, in run
  File "/tmp/tmpfe1bt8s9.cephadm.build/app/cephadmlib/call_wrappers.py",
line 310, in call_throws
RuntimeError: Failed command: /bin/docker run --rm --ipc=host
--stop-signal=SIGTERM --ulimit nofile=1048576 --net=host --entrypoint ceph
--init -e CONTAINER_IMAGE=opkbhfpsksp0101.p.fnst/ceph/ceph:v19.2.1 -e
NODE_NAME=opcpmfpsksa0403 opkbhfpsksp0101.p.fnst/ceph/ceph:v19.2.1 --version

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib64/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib64/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/tmp/tmpfe1bt8s9.cephadm.build/app/__main__.py", line 5581, in
<module>
  File "/tmp/tmpfe1bt8s9.cephadm.build/app/__main__.py", line 5569, in main
  File "/tmp/tmpfe1bt8s9.cephadm.build/app/__main__.py", line 2657, in
_rollback
  File "/tmp/tmpfe1bt8s9.cephadm.build/app/__main__.py", line 4391, in
_rm_cluster
  File "/tmp/tmpfe1bt8s9.cephadm.build/app/__main__.py", line 4317, in
get_ceph_cluster_count
FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/ceph'
-----

for that i checked everything lik docker version also containerd and runc
and just find this log in journalctl -u docker :

-------
Mar 02 06:23:00 opcpmfpsksa0403 systemd[1]: Starting Docker Application
Container Engine...
Mar 02 06:23:00 opcpmfpsksa0403 dockerd[18363]:
time="2025-03-02T06:23:00.533468075Z" level=info msg="Starting up"
Mar 02 06:23:00 opcpmfpsksa0403 dockerd[18363]:
time="2025-03-02T06:23:00.535018237Z" level=info msg="OTEL tracing is not
configured, using no-op tracer provider"
Mar 02 06:23:00 opcpmfpsksa0403 dockerd[18363]:
time="2025-03-02T06:23:00.566901059Z" level=info msg="[graphdriver] using
prior storage driver: overlay2"
Mar 02 06:23:00 opcpmfpsksa0403 dockerd[18363]:
time="2025-03-02T06:23:00.567653709Z" level=info msg="Loading containers:
start."
Mar 02 06:23:01 opcpmfpsksa0403 dockerd[18363]:
time="2025-03-02T06:23:01.325500958Z" level=info msg="Loading containers:
done."
Mar 02 06:23:01 opcpmfpsksa0403 dockerd[18363]:
time="2025-03-02T06:23:01.344752401Z" level=warning msg="Not using native
diff for overlay2, this may cause degraded performance for building images:
kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled" storage-driver=overlay2
Mar 02 06:23:01 opcpmfpsksa0403 dockerd[18363]:
time="2025-03-02T06:23:01.344892780Z" level=info msg="Docker daemon"
commit=bbd0a17 containerd-snapshotter=false storage-driver=overlay2
version=28.0.1
Mar 02 06:23:01 opcpmfpsksa0403 dockerd[18363]:
time="2025-03-02T06:23:01.344968676Z" level=info msg="Initializing buildkit"
Mar 02 06:23:01 opcpmfpsksa0403 dockerd[18363]:
time="2025-03-02T06:23:01.393298555Z" level=info msg="Completed buildkit
initialization"
Mar 02 06:23:01 opcpmfpsksa0403 dockerd[18363]:
time="2025-03-02T06:23:01.401141531Z" level=info msg="Daemon has completed
initialization"
Mar 02 06:23:01 opcpmfpsksa0403 dockerd[18363]:
time="2025-03-02T06:23:01.401227515Z" level=info msg="API listen on
/run/docker.sock"
Mar 02 06:23:01 opcpmfpsksa0403 systemd[1]: Started Docker Application
Container Engine.
Mar 08 05:42:50 opcpmfpsksa0403 dockerd[18363]:
time="2025-03-08T05:42:50.468402014Z" level=error msg="copy stream failed"
error="reading from a closed fifo" stream=stderr
Mar 08 05:42:50 opcpmfpsksa0403 dockerd[18363]:
time="2025-03-08T05:42:50.468433232Z" level=error msg="copy stream failed"
error="reading from a closed fifo" stream=stdout
Mar 08 05:42:51 opcpmfpsksa0403 dockerd[18363]:
time="2025-03-08T05:42:51.103965419Z" level=error msg="Handler for POST
/v1.48/containers/40b1295b9067eb570d01b1509c59593f29e7ad61fb61e8ed4a82166441d52d53/start
returned error: failed to create task for container: failed to create shim
task: OCI runtime create failed: runc create failed: unable to start
container process: can't copy bootstrap data to pipe: write init-p: broken
pipe: unknown"
--------
we use linux oracle 9.5 with kernel 5.15.0-305.176.4.el9uek.x86_64, we
searched anywhere but we can't understand that what happen, anybody know
how to can resolve that ? or what's happening?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx