I had more or less the same problem. This most likely synchronization issue. I have been deploying 16 OSD each running exactly the same hardware/software. The issue appeared randomly with no obvious correlations with other stuff. The dirty workaround was to put time.sleep(5) before invoking partprobe.
On 16 December 2015 at 07:17, Matt Taylor <mtaylor@xxxxxxxxxx> wrote:
Hi all,
After recently upgrading to CentOS 7.2 and installing a new Ceph cluster using Infernalis v9.2.0, I have noticed that disk's are failing to prepare.
I have observed the same behaviour over multiple Ceph servers when preparing disk's. All the servers are identical.
Disk's are zapping fine, however when running 'ceph-deploy disk prepare', we're encountering the following error:
[ceph_deploy.cli][INFO ] Invoked (1.5.30): /usr/bin/ceph-deploy disk prepare kvsrv02:/dev/sdr
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] disk : [('kvsrv02', '/dev/sdr', None)]
[ceph_deploy.cli][INFO ] dmcrypt : False
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] subcommand : prepare
[ceph_deploy.cli][INFO ] dmcrypt_key_dir : /etc/ceph/dmcrypt-keys
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f1d54a4a7a0>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] fs_type : xfs
[ceph_deploy.cli][INFO ] func : <function disk at 0x7f1d54a3bc08>
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.cli][INFO ] zap_disk : False
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks kvsrv02:/dev/sdr:
[kvsrv02][DEBUG ] connection detected need for sudo
[kvsrv02][DEBUG ] connected to host: kvsrv02
[kvsrv02][DEBUG ] detect platform information from remote host
[kvsrv02][DEBUG ] detect machine type
[kvsrv02][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.2.1511 Core
[ceph_deploy.osd][DEBUG ] Deploying osd to kvsrv02
[kvsrv02][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.osd][DEBUG ] Preparing host kvsrv02 disk /dev/sdr journal None activate False
[kvsrv02][INFO ] Running command: sudo ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sdr
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --check-allows-journal -i 0 --cluster ceph
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --check-wants-journal -i 0 --cluster ceph
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --check-needs-journal -i 0 --cluster ceph
[kvsrv02][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdr uuid path is /sys/dev/block/65:16/dm/uuid
[kvsrv02][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdr uuid path is /sys/dev/block/65:16/dm/uuid
[kvsrv02][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdr uuid path is /sys/dev/block/65:16/dm/uuid
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_cryptsetup_parameters
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_key_size
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_type
[kvsrv02][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdr uuid path is /sys/dev/block/65:16/dm/uuid
[kvsrv02][WARNIN] INFO:ceph-disk:Will colocate journal with data on /dev/sdr
[kvsrv02][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdr uuid path is /sys/dev/block/65:16/dm/uuid
[kvsrv02][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdr uuid path is /sys/dev/block/65:16/dm/uuid
[kvsrv02][WARNIN] DEBUG:ceph-disk:Creating journal partition num 2 size 5120 on /dev/sdr
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /sbin/sgdisk --new=2:0:5120M --change-name=2:ceph journal --partition-guid=2:7058473f-5c4a-4566-9a11-95cae71e5086 --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdr
[kvsrv02][DEBUG ] The operation has completed successfully.
[kvsrv02][WARNIN] DEBUG:ceph-disk:Calling partprobe on prepared device /dev/sdr
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdr
[kvsrv02][WARNIN] Error: Error informing the kernel about modifications to partition /dev/sdr2 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/sdr2 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
[kvsrv02][WARNIN] Error: Failed to add partition 2 (Device or resource busy)
[kvsrv02][WARNIN] Traceback (most recent call last):
[kvsrv02][WARNIN] File "/sbin/ceph-disk", line 3576, in <module>
[kvsrv02][WARNIN] main(sys.argv[1:])
[kvsrv02][WARNIN] File "/sbin/ceph-disk", line 3530, in main
[kvsrv02][WARNIN] args.func(args)
[kvsrv02][WARNIN] File "/sbin/ceph-disk", line 1863, in main_prepare
[kvsrv02][WARNIN] luks=luks
[kvsrv02][WARNIN] File "/sbin/ceph-disk", line 1465, in prepare_journal
[kvsrv02][WARNIN] return prepare_journal_dev(data, journal, journal_size, journal_uuid, journal_dm_keypath, cryptsetup_parameters, luks)
[kvsrv02][WARNIN] File "/sbin/ceph-disk", line 1419, in prepare_journal_dev
[kvsrv02][WARNIN] raise Error(e)
[kvsrv02][WARNIN] __main__.Error: Error: Command '['/sbin/partprobe', '/dev/sdr']' returned non-zero exit status 1
[kvsrv02][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sdr
[ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs
I dug a bit more into this, tried troubleshooting further such as rebooting (as mentioned by partprobe's error), dd if=/dev/zero of=/dev/sdr bs=512 count=1 -> reboot, using an older kernel, few other things between myself and my colleague.. etc. Nothing really worked!
So, I decided to test using 'partx -a' instead of partprobe, and also disregard the exit code upon completion of the command running.. just to see if it would have any difference. Here's the change I made to '/usr/sbin/ceph-disk':
[root@kvsrv02 ~]# diff -u /usr/sbin/ceph-disk_orig /usr/sbin/ceph-disk
--- /usr/sbin/ceph-disk_orig 2015-12-16 05:05:19.636866273 +0000
+++ /usr/sbin/ceph-disk 2015-12-16 05:27:07.905817825 +0000
@@ -35,6 +35,7 @@
import shlex
import pwd
import grp
+from subprocess import call
"""
Prepare:
@@ -1218,7 +1219,8 @@
"""
LOG.debug('Calling partprobe on %s device %s', description, dev)
command_check_call(['udevadm', 'settle'])
- command_check_call(['partprobe', dev])
+ call(['partx', '-a', dev])
+ #command_check_call(['partprobe', dev])
command_check_call(['udevadm', 'settle'])
def zap(dev):
I tested, and the disk prepared fine -> osd came online:
[ceph_deploy.cli][INFO ] Invoked (1.5.30): /usr/bin/ceph-deploy --overwrite-conf osd prepare kvsrv02:/dev/sdr
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] disk : [('kvsrv02', '/dev/sdr', None)]
[ceph_deploy.cli][INFO ] dmcrypt : False
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] overwrite_conf : True
[ceph_deploy.cli][INFO ] subcommand : prepare
[ceph_deploy.cli][INFO ] dmcrypt_key_dir : /etc/ceph/dmcrypt-keys
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fb83bf76758>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] fs_type : xfs
[ceph_deploy.cli][INFO ] func : <function osd at 0x7fb83bf65b90>
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.cli][INFO ] zap_disk : False
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks kvsrv02:/dev/sdr:
[kvsrv02][DEBUG ] connection detected need for sudo
[kvsrv02][DEBUG ] connected to host: kvsrv02
[kvsrv02][DEBUG ] detect platform information from remote host
[kvsrv02][DEBUG ] detect machine type
[kvsrv02][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.2.1511 Core
[ceph_deploy.osd][DEBUG ] Deploying osd to kvsrv02
[kvsrv02][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.osd][DEBUG ] Preparing host kvsrv02 disk /dev/sdr journal None activate False
[kvsrv02][INFO ] Running command: sudo ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sdr
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --check-allows-journal -i 0 --cluster ceph
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --check-wants-journal -i 0 --cluster ceph
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --check-needs-journal -i 0 --cluster ceph
[kvsrv02][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdr uuid path is /sys/dev/block/65:16/dm/uuid
[kvsrv02][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdr uuid path is /sys/dev/block/65:16/dm/uuid
[kvsrv02][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdr uuid path is /sys/dev/block/65:16/dm/uuid
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_cryptsetup_parameters
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_key_size
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_type
[kvsrv02][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdr uuid path is /sys/dev/block/65:16/dm/uuid
[kvsrv02][WARNIN] INFO:ceph-disk:Will colocate journal with data on /dev/sdr
[kvsrv02][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdr uuid path is /sys/dev/block/65:16/dm/uuid
[kvsrv02][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdr uuid path is /sys/dev/block/65:16/dm/uuid
[kvsrv02][WARNIN] DEBUG:ceph-disk:Creating journal partition num 2 size 5120 on /dev/sdr
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /sbin/sgdisk --new=2:0:5120M --change-name=2:ceph journal --partition-guid=2:b05454c0-e1f5-4cab-8bf5-2e64d19a804b --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdr
[kvsrv02][DEBUG ] Creating new GPT entries.
[kvsrv02][DEBUG ] The operation has completed successfully.
[kvsrv02][WARNIN] DEBUG:ceph-disk:Calling partprobe on prepared device /dev/sdr
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle
[kvsrv02][WARNIN] partx: /dev/sdr: error adding partition 2
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle
[kvsrv02][WARNIN] DEBUG:ceph-disk:Journal is GPT partition /dev/disk/by-partuuid/b05454c0-e1f5-4cab-8bf5-2e64d19a804b
[kvsrv02][WARNIN] DEBUG:ceph-disk:Journal is GPT partition /dev/disk/by-partuuid/b05454c0-e1f5-4cab-8bf5-2e64d19a804b
[kvsrv02][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdr uuid path is /sys/dev/block/65:16/dm/uuid
[kvsrv02][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdr uuid path is /sys/dev/block/65:16/dm/uuid
[kvsrv02][WARNIN] DEBUG:ceph-disk:Creating osd partition on /dev/sdr
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:faa6f47f-3360-417b-9843-cf9fb3b1bedc --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be -- /dev/sdr
[kvsrv02][DEBUG ] The operation has completed successfully.
[kvsrv02][WARNIN] DEBUG:ceph-disk:Calling partprobe on created device /dev/sdr
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle
[kvsrv02][WARNIN] partx: /dev/sdr: error adding partitions 1-2
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle
[kvsrv02][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdr uuid path is /sys/dev/block/65:16/dm/uuid
[kvsrv02][WARNIN] DEBUG:ceph-disk:Creating xfs fs on /dev/sdr1
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdr1
[kvsrv02][DEBUG ] meta-data="" isize=2048 agcount=4, agsize=24091717 blks
[kvsrv02][DEBUG ] = sectsz=4096 attr=2, projid32bit=1
[kvsrv02][DEBUG ] = crc=0 finobt=0
[kvsrv02][DEBUG ] data = "" blocks=96366865, imaxpct=25
[kvsrv02][DEBUG ] = sunit=0 swidth=0 blks
[kvsrv02][DEBUG ] naming =version 2 bsize=4096 ascii-ci=0 ftype=0
[kvsrv02][DEBUG ] log =internal log bsize=4096 blocks=47054, version=2
[kvsrv02][DEBUG ] = sectsz=4096 sunit=1 blks, lazy-count=1
[kvsrv02][DEBUG ] realtime =none extsz=4096 blocks=0, rtextents=0
[kvsrv02][WARNIN] DEBUG:ceph-disk:Mounting /dev/sdr1 on /var/lib/ceph/tmp/mnt.FASof5 with options noatime,inode64
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/mount -t xfs -o noatime,inode64 -- /dev/sdr1 /var/lib/ceph/tmp/mnt.FASof5
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /sbin/restorecon /var/lib/ceph/tmp/mnt.FASof5
[kvsrv02][WARNIN] DEBUG:ceph-disk:Preparing osd data dir /var/lib/ceph/tmp/mnt.FASof5
[kvsrv02][WARNIN] DEBUG:ceph-disk:Creating symlink /var/lib/ceph/tmp/mnt.FASof5/journal -> /dev/disk/by-partuuid/b05454c0-e1f5-4cab-8bf5-2e64d19a804b
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /sbin/restorecon -R /var/lib/ceph/tmp/mnt.FASof5/ceph_fsid.126649.tmp
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.FASof5/ceph_fsid.126649.tmp
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /sbin/restorecon -R /var/lib/ceph/tmp/mnt.FASof5/fsid.126649.tmp
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.FASof5/fsid.126649.tmp
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /sbin/restorecon -R /var/lib/ceph/tmp/mnt.FASof5/journal_uuid.126649.tmp
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.FASof5/journal_uuid.126649.tmp
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /sbin/restorecon -R /var/lib/ceph/tmp/mnt.FASof5/magic.126649.tmp
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.FASof5/magic.126649.tmp
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /sbin/restorecon -R /var/lib/ceph/tmp/mnt.FASof5
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.FASof5
[kvsrv02][WARNIN] DEBUG:ceph-disk:Unmounting /var/lib/ceph/tmp/mnt.FASof5
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.FASof5
[kvsrv02][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdr uuid path is /sys/dev/block/65:16/dm/uuid
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /sbin/sgdisk --typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/sdr
[kvsrv02][DEBUG ] Warning: The kernel is still using the old partition table.
[kvsrv02][DEBUG ] The new table will be used at the next reboot.
[kvsrv02][DEBUG ] The operation has completed successfully.
[kvsrv02][WARNIN] DEBUG:ceph-disk:Calling partprobe on prepared device /dev/sdr
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle
[kvsrv02][WARNIN] partx: /dev/sdr: error adding partitions 1-2
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle
[kvsrv02][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm trigger --action="" --sysname-match sdr1
[kvsrv02][INFO ] checking OSD status...
[kvsrv02][INFO ] Running command: sudo ceph --cluster=ceph osd stat --format=json
[kvsrv02][WARNIN] there are 6 OSDs down
[kvsrv02][WARNIN] there are 6 OSDs out
[ceph_deploy.osd][DEBUG ] Host kvsrv02 is now ready for osd use.
I am not 100% sure if this is either a problem with Ceph v9.2.0 or to do with the recent update of CentOS 7.2
Has anyone else encountered a similar problem?
Also, should I be posting this on ceph-devel mailing list, or is here OK?
Thanks!
Regards,
Matthew Taylor.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Mykola
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com