On Fri, Nov 15, 2013 at 2:53 PM, Gruher, Joseph R <joseph.r.gruher@xxxxxxxxx> wrote: > Using ceph-deploy 1.3.2 with ceph 0.72.1. Ceph-deploy disk zap will fail > and exit with error, but then on retry will succeed. This is repeatable as > I go through each of the OSD disks in my cluster. See output below. > > > > I am guessing the first attempt to run changes something about the initial > state of the disk which then allows the second run to complete, but if it > can be changed to where it will complete, why doesn’t the first run just do > that? > > > > The main negative effect is this causes a compact command like ceph-deploy > disk zap joceph0{1,2,3,4}:/dev/sd{b,c,d,e,f} to fail and exit without > running through all the targets. > > > > I did not encounter this in the previous release of ceph and ceph-deploy > (dumpling and 1.2.7?) but I can’t say for sure my disks were in the same > initial state when running ceph-deploy on that release. > There has not been any changes in ceph-deploy related to OSDs so this is indeed not expected behavior at all and sounds like something changed. Is it at all possible that (just to make sure) you try to reproduce this with an older ceph-deploy? I really don't think it is ceph-deploy causing this, but making sure would not hurt. > > > Would this be a bug, or expected behavior? > > > > ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk zap joceph02:/dev/sdc > > [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy disk zap > joceph02:/dev/sdc > > [ceph_deploy.osd][DEBUG ] zapping /dev/sdc on joceph02 > > [joceph02][DEBUG ] connected to host: joceph02 > > [joceph02][DEBUG ] detect platform information from remote host > > [joceph02][DEBUG ] detect machine type > > [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring > > [joceph02][DEBUG ] zeroing last few blocks of device > > [joceph02][INFO ] Running command: sudo sgdisk --zap-all --clear --mbrtogpt > -- /dev/sdc > > [joceph02][ERROR ] Caution: invalid main GPT header, but valid backup; > regenerating main header > > [joceph02][ERROR ] from backup! > > [joceph02][ERROR ] > > [joceph02][ERROR ] Warning! Main partition table CRC mismatch! Loaded backup > partition table > > [joceph02][ERROR ] instead of main partition table! > > [joceph02][ERROR ] > > [joceph02][ERROR ] Warning! One or more CRCs don't match. You should repair > the disk! > > [joceph02][ERROR ] > > [joceph02][ERROR ] Invalid partition data! > > [joceph02][DEBUG ] Caution! After loading partitions, the CRC doesn't check > out! > > [joceph02][DEBUG ] GPT data structures destroyed! You may now partition the > disk using fdisk or > > [joceph02][DEBUG ] other utilities. > > [joceph02][DEBUG ] Information: Creating fresh partition table; will > override earlier problems! > > [joceph02][DEBUG ] Non-GPT disk; not saving changes. Use -g to override. > > [joceph02][ERROR ] Traceback (most recent call last): > > [joceph02][ERROR ] File > "/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/process.py", line > 68, in run > > [joceph02][ERROR ] reporting(conn, result, timeout) > > [joceph02][ERROR ] File > "/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/log.py", line 13, > in reporting > > [joceph02][ERROR ] received = result.receive(timeout) > > [joceph02][ERROR ] File > "/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/lib/execnet/gateway_base.py", > line 455, in receive > > [joceph02][ERROR ] raise self._getremoteerror() or EOFError() > > [joceph02][ERROR ] RemoteError: Traceback (most recent call last): > > [joceph02][ERROR ] File "<string>", line 806, in executetask > > [joceph02][ERROR ] File "", line 35, in _remote_run > > [joceph02][ERROR ] RuntimeError: command returned non-zero exit status: 3 > > [joceph02][ERROR ] > > [joceph02][ERROR ] > > [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: sgdisk > --zap-all --clear --mbrtogpt -- /dev/sdc > > > > > > ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk zap joceph02:/dev/sdc > > [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy disk zap > joceph02:/dev/sdc > > [ceph_deploy.osd][DEBUG ] zapping /dev/sdc on joceph02 > > [joceph02][DEBUG ] connected to host: joceph02 > > [joceph02][DEBUG ] detect platform information from remote host > > [joceph02][DEBUG ] detect machine type > > [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring > > [joceph02][DEBUG ] zeroing last few blocks of device > > [joceph02][INFO ] Running command: sudo sgdisk --zap-all --clear --mbrtogpt > -- /dev/sdc > > [joceph02][DEBUG ] Creating new GPT entries. > > [joceph02][DEBUG ] GPT data structures destroyed! You may now partition the > disk using fdisk or > > [joceph02][DEBUG ] other utilities. > > [joceph02][DEBUG ] The operation has completed successfully. > > ceph@joceph-admin01:/etc/ceph$ > > > > > > > > Here’s some additional output with a disk-list executed in between zaps: > > > > ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk list joceph02 > > [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy disk list > joceph02 > > [joceph02][DEBUG ] connected to host: joceph02 > > [joceph02][DEBUG ] detect platform information from remote host > > [joceph02][DEBUG ] detect machine type > > [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring > > [ceph_deploy.osd][DEBUG ] Listing disks on joceph02... > > [joceph02][INFO ] Running command: sudo ceph-disk list > > [joceph02][DEBUG ] /dev/sda : > > [joceph02][DEBUG ] /dev/sda1 other, ext4, mounted on / > > [joceph02][DEBUG ] /dev/sda2 other > > [joceph02][DEBUG ] /dev/sda5 swap, swap > > [joceph02][DEBUG ] /dev/sdb other, unknown > > [joceph02][DEBUG ] /dev/sdc other, unknown > > [joceph02][DEBUG ] /dev/sdd : > > [joceph02][DEBUG ] /dev/sdd1 other > > [joceph02][DEBUG ] /dev/sde : > > [joceph02][DEBUG ] /dev/sde1 other > > [joceph02][DEBUG ] /dev/sdf : > > [joceph02][DEBUG ] /dev/sdf1 other > > ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk zap joceph02:/dev/sdd > > [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy disk zap > joceph02:/dev/sdd > > [ceph_deploy.osd][DEBUG ] zapping /dev/sdd on joceph02 > > [joceph02][DEBUG ] connected to host: joceph02 > > [joceph02][DEBUG ] detect platform information from remote host > > [joceph02][DEBUG ] detect machine type > > [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring > > [joceph02][DEBUG ] zeroing last few blocks of device > > [joceph02][INFO ] Running command: sudo sgdisk --zap-all --clear --mbrtogpt > -- /dev/sdd > > [joceph02][ERROR ] Caution: invalid main GPT header, but valid backup; > regenerating main header > > [joceph02][ERROR ] from backup! > > [joceph02][ERROR ] > > [joceph02][ERROR ] Warning! Main partition table CRC mismatch! Loaded backup > partition table > > [joceph02][ERROR ] instead of main partition table! > > [joceph02][ERROR ] > > [joceph02][ERROR ] Warning! One or more CRCs don't match. You should repair > the disk! > > [joceph02][ERROR ] > > [joceph02][ERROR ] Invalid partition data! > > [joceph02][DEBUG ] Caution! After loading partitions, the CRC doesn't check > out! > > [joceph02][DEBUG ] GPT data structures destroyed! You may now partition the > disk using fdisk or > > [joceph02][DEBUG ] other utilities. > > [joceph02][DEBUG ] Information: Creating fresh partition table; will > override earlier problems! > > [joceph02][DEBUG ] Non-GPT disk; not saving changes. Use -g to override. > > [joceph02][ERROR ] Traceback (most recent call last): > > [joceph02][ERROR ] File > "/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/process.py", line > 68, in run > > [joceph02][ERROR ] reporting(conn, result, timeout) > > [joceph02][ERROR ] File > "/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/log.py", line 13, > in reporting > > [joceph02][ERROR ] received = result.receive(timeout) > > [joceph02][ERROR ] File > "/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/lib/execnet/gateway_base.py", > line 455, in receive > > [joceph02][ERROR ] raise self._getremoteerror() or EOFError() > > [joceph02][ERROR ] RemoteError: Traceback (most recent call last): > > [joceph02][ERROR ] File "<string>", line 806, in executetask > > [joceph02][ERROR ] File "", line 35, in _remote_run > > [joceph02][ERROR ] RuntimeError: command returned non-zero exit status: 3 > > [joceph02][ERROR ] > > [joceph02][ERROR ] > > [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: sgdisk > --zap-all --clear --mbrtogpt -- /dev/sdd > > > > ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk list joceph02 > > [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy disk list > joceph02 > > [joceph02][DEBUG ] connected to host: joceph02 > > [joceph02][DEBUG ] detect platform information from remote host > > [joceph02][DEBUG ] detect machine type > > [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring > > [ceph_deploy.osd][DEBUG ] Listing disks on joceph02... > > [joceph02][INFO ] Running command: sudo ceph-disk list > > [joceph02][DEBUG ] /dev/sda : > > [joceph02][DEBUG ] /dev/sda1 other, ext4, mounted on / > > [joceph02][DEBUG ] /dev/sda2 other > > [joceph02][DEBUG ] /dev/sda5 swap, swap > > [joceph02][DEBUG ] /dev/sdb other, unknown > > [joceph02][DEBUG ] /dev/sdc other, unknown > > [joceph02][DEBUG ] /dev/sdd other, unknown > > [joceph02][DEBUG ] /dev/sde : > > [joceph02][DEBUG ] /dev/sde1 other > > [joceph02][DEBUG ] /dev/sdf : > > [joceph02][DEBUG ] /dev/sdf1 other > > ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk zap joceph02:/dev/sdd > > [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy disk zap > joceph02:/dev/sdd > > [ceph_deploy.osd][DEBUG ] zapping /dev/sdd on joceph02 > > [joceph02][DEBUG ] connected to host: joceph02 > > [joceph02][DEBUG ] detect platform information from remote host > > [joceph02][DEBUG ] detect machine type > > [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring > > [joceph02][DEBUG ] zeroing last few blocks of device > > [joceph02][INFO ] Running command: sudo sgdisk --zap-all --clear --mbrtogpt > -- /dev/sdd > > [joceph02][DEBUG ] Creating new GPT entries. > > [joceph02][DEBUG ] GPT data structures destroyed! You may now partition the > disk using fdisk or > > [joceph02][DEBUG ] other utilities. > > [joceph02][DEBUG ] The operation has completed successfully. > > ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk list joceph02 > > [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy disk list > joceph02 > > [joceph02][DEBUG ] connected to host: joceph02 > > [joceph02][DEBUG ] detect platform information from remote host > > [joceph02][DEBUG ] detect machine type > > [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring > > [ceph_deploy.osd][DEBUG ] Listing disks on joceph02... > > [joceph02][INFO ] Running command: sudo ceph-disk list > > [joceph02][DEBUG ] /dev/sda : > > [joceph02][DEBUG ] /dev/sda1 other, ext4, mounted on / > > [joceph02][DEBUG ] /dev/sda2 other > > [joceph02][DEBUG ] /dev/sda5 swap, swap > > [joceph02][DEBUG ] /dev/sdb other, unknown > > [joceph02][DEBUG ] /dev/sdc other, unknown > > [joceph02][DEBUG ] /dev/sdd other, unknown > > [joceph02][DEBUG ] /dev/sde : > > [joceph02][DEBUG ] /dev/sde1 other > > [joceph02][DEBUG ] /dev/sdf : > > [joceph02][DEBUG ] /dev/sdf1 other > > ceph@joceph-admin01:/etc/ceph$ > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com