Strange crush / ceph-deploy issue

Reed Dier <reed.dier@xxxxxxxxxxx> · Fri, 31 Mar 2017 22:06:44 -0500

Trying to add a batch of OSD’s to my cluster, (Jewel 10.2.6, Ubuntu 16.04)
2 new nodes (ceph01,ceph02), 10 OSD’s per node.

I am trying to steer the OSD’s into a different root pool with crush location set in ceph.conf with 
[osd.34]
crush_location = "host=ceph01 rack=ssd.rack2 root=ssd"

[osd.35]
crush_location = "host=ceph01 rack=ssd.rack2 root=ssd"

[osd.36]
crush_location = "host=ceph01 rack=ssd.rack2 root=ssd"

[osd.37]
crush_location = "host=ceph01 rack=ssd.rack2 root=ssd"

[osd.38]
crush_location = "host=ceph01 rack=ssd.rack2 root=ssd"

[osd.39]
crush_location = "host=ceph01 rack=ssd.rack2 root=ssd"

[osd.40]
crush_location = "host=ceph01 rack=ssd.rack2 root=ssd"

[osd.41]
crush_location = "host=ceph01 rack=ssd.rack2 root=ssd"

[osd.42]
crush_location = "host=ceph01 rack=ssd.rack2 root=ssd"

[osd.43]
crush_location = "host=ceph01 rack=ssd.rack2 root=ssd”

[osd.44]
crush_location = "host=ceph02 rack=ssd.rack2 root=ssd"

[osd.45]
crush_location = "host=ceph02 rack=ssd.rack2 root=ssd"

[osd.46]
crush_location = "host=ceph02 rack=ssd.rack2 root=ssd"

[osd.47]
crush_location = "host=ceph02 rack=ssd.rack2 root=ssd"

[osd.48]
crush_location = "host=ceph02 rack=ssd.rack2 root=ssd"

[osd.49]
crush_location = "host=ceph02 rack=ssd.rack2 root=ssd"

[osd.50]
crush_location = "host=ceph02 rack=ssd.rack2 root=ssd"

[osd.51]
crush_location = "host=ceph02 rack=ssd.rack2 root=ssd"

[osd.52]
crush_location = "host=ceph02 rack=ssd.rack2 root=ssd"

[osd.53]
crush_location = "host=ceph02 rack=ssd.rack2 root=ssd”

Adding ceph01 and its OSDs went without a hitch.
However, ceph02 is completely getting lost, and its osd’s are getting zero weighted into the bottom of the osd tree at the root level.

$ ceph osd tree
ID  WEIGHT    TYPE NAME                     UP/DOWN REWEIGHT PRIMARY-AFFINITY
-13  34.91394 root ssd
-11  34.91394     rack ssd.rack2
-14  17.45697         host ceph00
 24   1.74570             osd.24                 up  1.00000          1.00000
 25   1.74570             osd.25                 up  1.00000          1.00000
 26   1.74570             osd.26                 up  1.00000          1.00000
 27   1.74570             osd.27                 up  1.00000          1.00000
 28   1.74570             osd.28                 up  1.00000          1.00000
 29   1.74570             osd.29                 up  1.00000          1.00000
 30   1.74570             osd.30                 up  1.00000          1.00000
 31   1.74570             osd.31                 up  1.00000          1.00000
 32   1.74570             osd.32                 up  1.00000          1.00000
 33   1.74570             osd.33                 up  1.00000          1.00000
-15  17.45697         host ceph01
 34   1.74570             osd.34                 up  1.00000          1.00000
 35   1.74570             osd.35                 up  1.00000          1.00000
 36   1.74570             osd.36                 up  1.00000          1.00000
 37   1.74570             osd.37                 up  1.00000          1.00000
 38   1.74570             osd.38                 up  1.00000          1.00000
 39   1.74570             osd.39                 up  1.00000          1.00000
 40   1.74570             osd.40                 up  1.00000          1.00000
 41   1.74570             osd.41                 up  1.00000          1.00000
 42   1.74570             osd.42                 up  1.00000          1.00000
 43   1.74570             osd.43                 up  1.00000          1.00000
-16         0         host ceph02
-10         0 rack default.rack2
-12         0     chassis default.rack2.U16
 -1 174.51584 root default
 -2  21.81029     host node24
  0   7.27010         osd.0                      up  1.00000          1.00000
  8   7.27010         osd.8                      up  1.00000          1.00000
 16   7.27010         osd.16                     up  1.00000          1.00000
 -3  21.81029     host node25
  1   7.27010         osd.1                      up  1.00000          1.00000
  9   7.27010         osd.9                      up  1.00000          1.00000
 17   7.27010         osd.17                     up  1.00000          1.00000
 -4  21.81987     host node26
 10   7.27010         osd.10                     up  1.00000          1.00000
 18   7.27489         osd.18                     up  1.00000          1.00000
  2   7.27489         osd.2                      up  1.00000          1.00000
 -5  21.81508     host node27
  3   7.27010         osd.3                      up  1.00000          1.00000
 11   7.27010         osd.11                     up  1.00000          1.00000
 19   7.27489         osd.19                     up  1.00000          1.00000
 -6  21.81508     host node28
  4   7.27010         osd.4                      up  1.00000          1.00000
 12   7.27010         osd.12                     up  1.00000          1.00000
 20   7.27489         osd.20                     up  1.00000          1.00000
 -7  21.81508     host node29
  5   7.27010         osd.5                      up  1.00000          1.00000
 13   7.27010         osd.13                     up  1.00000          1.00000
 21   7.27489         osd.21                     up  1.00000          1.00000
 -8  21.81508     host node30
  6   7.27010         osd.6                      up  1.00000          1.00000
 14   7.27010         osd.14                     up  1.00000          1.00000
 22   7.27489         osd.22                     up  1.00000          1.00000
 -9  21.81508     host node31
  7   7.27010         osd.7                      up  1.00000          1.00000
 15   7.27010         osd.15                     up  1.00000          1.00000
 23   7.27489         osd.23                     up  1.00000          1.00000
 44         0 osd.44                           down  1.00000          1.00000

I manually added ceph02 to the crush map with cli
$ ceph osd crush add-bucket ceph02 host
$ ceph osd crush move ceph02 root=ssd
$ ceph osd crush move ceph02 rack=ssd.rack2

That still didn’t make a difference (host ceph02 wasn’t even getting added to the crush map before).

This is the output of ceph-deploy (1.5.37) when trying to add osd.44 with ceph-deploy
$ ceph-deploy --username root osd prepare ceph02:sda:/dev/nvme0n1p4
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/maint/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.37): /usr/bin/ceph-deploy --username root osd prepare ceph02:sda:/dev/nvme0n1p4
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : root
[ceph_deploy.cli][INFO  ]  disk                          : [('ceph02', '/dev/sda', '/dev/nvme0n1p4')]
[ceph_deploy.cli][INFO  ]  dmcrypt                       : False
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  bluestore                     : None
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  subcommand                    : prepare
[ceph_deploy.cli][INFO  ]  dmcrypt_key_dir               : /etc/ceph/dmcrypt-keys
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f0834a09248>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  fs_type                       : xfs
[ceph_deploy.cli][INFO  ]  func                          : <function osd at 0x7f0834e6d398>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.cli][INFO  ]  zap_disk                      : False
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks ceph02:/dev/sda:/dev/nvme0n1p4
[ceph02][DEBUG ] connected to host: root@ceph02
[ceph02][DEBUG ] detect platform information from remote host
[ceph02][DEBUG ] detect machine type
[ceph02][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 16.04 xenial
[ceph_deploy.osd][DEBUG ] Deploying osd to ceph02
[ceph02][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.osd][DEBUG ] Preparing host ceph02 disk /dev/sda journal /dev/nvme0n1p4 activate False
[ceph02][DEBUG ] find the location of an executable
[ceph02][INFO  ] Running command: /usr/sbin/ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sda /dev/nvme0n1p4
[ceph02][WARNIN] command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
[ceph02][WARNIN] command: Running command: /usr/bin/ceph-osd --check-allows-journal -i 0 --cluster ceph --setuser ceph --setgroup ceph
[ceph02][WARNIN] command: Running command: /usr/bin/ceph-osd --check-wants-journal -i 0 --cluster ceph --setuser ceph --setgroup ceph
[ceph02][WARNIN] command: Running command: /usr/bin/ceph-osd --check-needs-journal -i 0 --cluster ceph --setuser ceph --setgroup ceph
[ceph02][WARNIN] get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
[ceph02][WARNIN] command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size
[ceph02][WARNIN] get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
[ceph02][WARNIN] get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
[ceph02][WARNIN] get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
[ceph02][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
[ceph02][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
[ceph02][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
[ceph02][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
[ceph02][WARNIN] get_dm_uuid: get_dm_uuid /dev/nvme0n1p4 uuid path is /sys/dev/block/259:13/dm/uuid
[ceph02][WARNIN] prepare_device: Journal /dev/nvme0n1p4 is a partition
[ceph02][WARNIN] get_dm_uuid: get_dm_uuid /dev/nvme0n1p4 uuid path is /sys/dev/block/259:13/dm/uuid
[ceph02][WARNIN] prepare_device: OSD will not be hot-swappable if journal is not the same device as the osd data
[ceph02][WARNIN] command: Running command: /sbin/blkid -o udev -p /dev/nvme0n1p4
[ceph02][WARNIN] prepare_device: Journal /dev/nvme0n1p4 was not prepared with ceph-disk. Symlinking directly.
[ceph02][WARNIN] get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
[ceph02][WARNIN] set_data_partition: Creating osd partition on /dev/sda
[ceph02][WARNIN] get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
[ceph02][WARNIN] ptype_tobe_for_name: name = data
[ceph02][WARNIN] get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
[ceph02][WARNIN] create_partition: Creating data partition num 1 size 0 on /dev/sda
[ceph02][WARNIN] command_check_call: Running command: /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:9e26d63f-cc60-4c41-93ef-c936a657b643 --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/sda
[ceph02][DEBUG ] Setting name!
[ceph02][DEBUG ] partNum is 0
[ceph02][WARNIN] update_partition: Calling partprobe on created device /dev/sda
[ceph02][DEBUG ] REALLY setting name!
[ceph02][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600
[ceph02][DEBUG ] The operation has completed successfully.
[ceph02][WARNIN] command: Running command: /usr/bin/flock -s /dev/sda /sbin/partprobe /dev/sda
[ceph02][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600
[ceph02][WARNIN] get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
[ceph02][WARNIN] get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
[ceph02][WARNIN] get_dm_uuid: get_dm_uuid /dev/sda1 uuid path is /sys/dev/block/8:1/dm/uuid
[ceph02][WARNIN] populate_data_path_device: Creating xfs fs on /dev/sda1
[ceph02][WARNIN] command_check_call: Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sda1
[ceph02][DEBUG ] meta-data=""              isize=2048   agcount=4, agsize=117210837 blks
[ceph02][DEBUG ]          =                       sectsz=512   attr=2, projid32bit=1
[ceph02][DEBUG ]          =                       crc=1        finobt=1, sparse=0
[ceph02][DEBUG ] data     =                       bsize=4096   blocks=468843345, imaxpct=5
[ceph02][DEBUG ]          =                       sunit=0      swidth=0 blks
[ceph02][DEBUG ] naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
[ceph02][DEBUG ] log      =internal log           bsize=4096   blocks=228927, version=2
[ceph02][DEBUG ]          =                       sectsz=512   sunit=0 blks, lazy-count=1
[ceph02][DEBUG ] realtime =none                   extsz=4096   blocks=0, rtextents=0
[ceph02][WARNIN] mount: Mounting /dev/sda1 on /var/lib/ceph/tmp/mnt.9Zpt6h with options noatime,inode64
[ceph02][WARNIN] command_check_call: Running command: /bin/mount -t xfs -o noatime,inode64 -- /dev/sda1 /var/lib/ceph/tmp/mnt.9Zpt6h
[ceph02][WARNIN] populate_data_path: Preparing osd data dir /var/lib/ceph/tmp/mnt.9Zpt6h
[ceph02][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.9Zpt6h/ceph_fsid.28789.tmp
[ceph02][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.9Zpt6h/fsid.28789.tmp
[ceph02][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.9Zpt6h/magic.28789.tmp
[ceph02][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.9Zpt6h/journal_uuid.28789.tmp
[ceph02][WARNIN] adjust_symlink: Creating symlink /var/lib/ceph/tmp/mnt.9Zpt6h/journal -> /dev/nvme0n1p4
[ceph02][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.9Zpt6h
[ceph02][WARNIN] unmount: Unmounting /var/lib/ceph/tmp/mnt.9Zpt6h
[ceph02][WARNIN] command_check_call: Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.9Zpt6h
[ceph02][WARNIN] get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
[ceph02][WARNIN] command_check_call: Running command: /sbin/sgdisk --typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/sda
[ceph02][DEBUG ] Warning: The kernel is still using the old partition table.
[ceph02][DEBUG ] The new table will be used at the next reboot or after you
[ceph02][DEBUG ] run partprobe(8) or kpartx(8)
[ceph02][DEBUG ] The operation has completed successfully.
[ceph02][WARNIN] update_partition: Calling partprobe on prepared device /dev/sda
[ceph02][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600
[ceph02][WARNIN] command: Running command: /usr/bin/flock -s /dev/sda /sbin/partprobe /dev/sda
[ceph02][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600
[ceph02][WARNIN] command_check_call: Running command: /sbin/udevadm trigger --action="" --sysname-match sda1
[ceph02][INFO  ] checking OSD status...
[ceph02][DEBUG ] find the location of an executable
[ceph02][INFO  ] Running command: /usr/bin/ceph --cluster=ceph osd stat --format=json
[ceph02][WARNIN] there is 1 OSD down
[ceph_deploy.osd][DEBUG ] Host ceph02 is now ready for osd use.

Hoping someone might be checking their email over the weekend that can easily spot something I have overlooked somehow.
Just very odd to see it work without issue on the first node, and not work on the second node, both configured identically, deployed identically, with different results.

Appreciate any help.

Reed
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com