Using ceph-deploy 1.3.2 with ceph 0.72.1. Ceph-deploy disk zap will fail and exit with error, but then on retry will succeed. This is repeatable as I go through each of the OSD disks in my cluster. See output below.
I am guessing the first attempt to run changes something about the initial state of the disk which then allows the second run to complete, but if it can be changed to where it will complete, why doesn’t the first run just do that?
The main negative effect is this causes a compact command like ceph-deploy disk zap joceph0{1,2,3,4}:/dev/sd{b,c,d,e,f} to fail and exit without running through all the targets. I did not encounter this in the previous release of ceph and ceph-deploy (dumpling and 1.2.7?) but I can’t say for sure my disks were in the same initial state when running ceph-deploy on that release. Would this be a bug, or expected behavior? ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk zap joceph02:/dev/sdc [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy disk zap joceph02:/dev/sdc [ceph_deploy.osd][DEBUG ] zapping /dev/sdc on joceph02 [joceph02][DEBUG ] connected to host: joceph02 [joceph02][DEBUG ] detect platform information from remote host [joceph02][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring [joceph02][DEBUG ] zeroing last few blocks of device [joceph02][INFO ] Running command: sudo sgdisk --zap-all --clear --mbrtogpt -- /dev/sdc [joceph02][ERROR ] Caution: invalid main GPT header, but valid backup; regenerating main header [joceph02][ERROR ] from backup! [joceph02][ERROR ] [joceph02][ERROR ] Warning! Main partition table CRC mismatch! Loaded backup partition table [joceph02][ERROR ] instead of main partition table! [joceph02][ERROR ] [joceph02][ERROR ] Warning! One or more CRCs don't match. You should repair the disk! [joceph02][ERROR ] [joceph02][ERROR ] Invalid partition data! [joceph02][DEBUG ] Caution! After loading partitions, the CRC doesn't check out! [joceph02][DEBUG ] GPT data structures destroyed! You may now partition the disk using fdisk or [joceph02][DEBUG ] other utilities. [joceph02][DEBUG ] Information: Creating fresh partition table; will override earlier problems! [joceph02][DEBUG ] Non-GPT disk; not saving changes. Use -g to override. [joceph02][ERROR ] Traceback (most recent call last): [joceph02][ERROR ] File "/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/process.py", line 68, in run [joceph02][ERROR ] reporting(conn, result, timeout) [joceph02][ERROR ] File "/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/log.py", line 13, in reporting [joceph02][ERROR ] received = result.receive(timeout) [joceph02][ERROR ] File "/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/lib/execnet/gateway_base.py", line 455, in receive [joceph02][ERROR ] raise self._getremoteerror() or EOFError() [joceph02][ERROR ] RemoteError: Traceback (most recent call last): [joceph02][ERROR ] File "<string>", line 806, in executetask [joceph02][ERROR ] File "", line 35, in _remote_run [joceph02][ERROR ] RuntimeError: command returned non-zero exit status: 3 [joceph02][ERROR ] [joceph02][ERROR ] [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: sgdisk --zap-all --clear --mbrtogpt -- /dev/sdc ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk zap joceph02:/dev/sdc [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy disk zap joceph02:/dev/sdc [ceph_deploy.osd][DEBUG ] zapping /dev/sdc on joceph02 [joceph02][DEBUG ] connected to host: joceph02 [joceph02][DEBUG ] detect platform information from remote host [joceph02][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring [joceph02][DEBUG ] zeroing last few blocks of device [joceph02][INFO ] Running command: sudo sgdisk --zap-all --clear --mbrtogpt -- /dev/sdc [joceph02][DEBUG ] Creating new GPT entries. [joceph02][DEBUG ] GPT data structures destroyed! You may now partition the disk using fdisk or [joceph02][DEBUG ] other utilities. [joceph02][DEBUG ] The operation has completed successfully. ceph@joceph-admin01:/etc/ceph$ Here’s some additional output with a disk-list executed in between zaps: ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk list joceph02 [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy disk list joceph02 [joceph02][DEBUG ] connected to host: joceph02 [joceph02][DEBUG ] detect platform information from remote host [joceph02][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring [ceph_deploy.osd][DEBUG ] Listing disks on joceph02... [joceph02][INFO ] Running command: sudo ceph-disk list [joceph02][DEBUG ] /dev/sda : [joceph02][DEBUG ] /dev/sda1 other, ext4, mounted on / [joceph02][DEBUG ] /dev/sda2 other [joceph02][DEBUG ] /dev/sda5 swap, swap [joceph02][DEBUG ] /dev/sdb other, unknown [joceph02][DEBUG ] /dev/sdc other, unknown [joceph02][DEBUG ] /dev/sdd : [joceph02][DEBUG ] /dev/sdd1 other [joceph02][DEBUG ] /dev/sde : [joceph02][DEBUG ] /dev/sde1 other [joceph02][DEBUG ] /dev/sdf : [joceph02][DEBUG ] /dev/sdf1 other ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk zap joceph02:/dev/sdd [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy disk zap joceph02:/dev/sdd [ceph_deploy.osd][DEBUG ] zapping /dev/sdd on joceph02 [joceph02][DEBUG ] connected to host: joceph02 [joceph02][DEBUG ] detect platform information from remote host [joceph02][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring [joceph02][DEBUG ] zeroing last few blocks of device [joceph02][INFO ] Running command: sudo sgdisk --zap-all --clear --mbrtogpt -- /dev/sdd [joceph02][ERROR ] Caution: invalid main GPT header, but valid backup; regenerating main header [joceph02][ERROR ] from backup! [joceph02][ERROR ] [joceph02][ERROR ] Warning! Main partition table CRC mismatch! Loaded backup partition table [joceph02][ERROR ] instead of main partition table! [joceph02][ERROR ] [joceph02][ERROR ] Warning! One or more CRCs don't match. You should repair the disk! [joceph02][ERROR ] [joceph02][ERROR ] Invalid partition data! [joceph02][DEBUG ] Caution! After loading partitions, the CRC doesn't check out! [joceph02][DEBUG ] GPT data structures destroyed! You may now partition the disk using fdisk or [joceph02][DEBUG ] other utilities. [joceph02][DEBUG ] Information: Creating fresh partition table; will override earlier problems! [joceph02][DEBUG ] Non-GPT disk; not saving changes. Use -g to override. [joceph02][ERROR ] Traceback (most recent call last): [joceph02][ERROR ] File "/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/process.py", line 68, in run [joceph02][ERROR ] reporting(conn, result, timeout) [joceph02][ERROR ] File "/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/log.py", line 13, in reporting [joceph02][ERROR ] received = result.receive(timeout) [joceph02][ERROR ] File "/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/lib/execnet/gateway_base.py", line 455, in receive [joceph02][ERROR ] raise self._getremoteerror() or EOFError() [joceph02][ERROR ] RemoteError: Traceback (most recent call last): [joceph02][ERROR ] File "<string>", line 806, in executetask [joceph02][ERROR ] File "", line 35, in _remote_run [joceph02][ERROR ] RuntimeError: command returned non-zero exit status: 3 [joceph02][ERROR ] [joceph02][ERROR ] [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: sgdisk --zap-all --clear --mbrtogpt -- /dev/sdd ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk list joceph02 [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy disk list joceph02 [joceph02][DEBUG ] connected to host: joceph02 [joceph02][DEBUG ] detect platform information from remote host [joceph02][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring [ceph_deploy.osd][DEBUG ] Listing disks on joceph02... [joceph02][INFO ] Running command: sudo ceph-disk list [joceph02][DEBUG ] /dev/sda : [joceph02][DEBUG ] /dev/sda1 other, ext4, mounted on / [joceph02][DEBUG ] /dev/sda2 other [joceph02][DEBUG ] /dev/sda5 swap, swap [joceph02][DEBUG ] /dev/sdb other, unknown [joceph02][DEBUG ] /dev/sdc other, unknown [joceph02][DEBUG ] /dev/sdd other, unknown [joceph02][DEBUG ] /dev/sde : [joceph02][DEBUG ] /dev/sde1 other [joceph02][DEBUG ] /dev/sdf : [joceph02][DEBUG ] /dev/sdf1 other ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk zap joceph02:/dev/sdd [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy disk zap joceph02:/dev/sdd [ceph_deploy.osd][DEBUG ] zapping /dev/sdd on joceph02 [joceph02][DEBUG ] connected to host: joceph02 [joceph02][DEBUG ] detect platform information from remote host [joceph02][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring [joceph02][DEBUG ] zeroing last few blocks of device [joceph02][INFO ] Running command: sudo sgdisk --zap-all --clear --mbrtogpt -- /dev/sdd [joceph02][DEBUG ] Creating new GPT entries. [joceph02][DEBUG ] GPT data structures destroyed! You may now partition the disk using fdisk or [joceph02][DEBUG ] other utilities. [joceph02][DEBUG ] The operation has completed successfully. ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk list joceph02 [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy disk list joceph02 [joceph02][DEBUG ] connected to host: joceph02 [joceph02][DEBUG ] detect platform information from remote host [joceph02][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring [ceph_deploy.osd][DEBUG ] Listing disks on joceph02... [joceph02][INFO ] Running command: sudo ceph-disk list [joceph02][DEBUG ] /dev/sda : [joceph02][DEBUG ] /dev/sda1 other, ext4, mounted on / [joceph02][DEBUG ] /dev/sda2 other [joceph02][DEBUG ] /dev/sda5 swap, swap [joceph02][DEBUG ] /dev/sdb other, unknown [joceph02][DEBUG ] /dev/sdc other, unknown [joceph02][DEBUG ] /dev/sdd other, unknown [joceph02][DEBUG ] /dev/sde : [joceph02][DEBUG ] /dev/sde1 other [joceph02][DEBUG ] /dev/sdf : [joceph02][DEBUG ] /dev/sdf1 other ceph@joceph-admin01:/etc/ceph$ |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com