On Thu, Aug 23, 2018 at 9:56 AM, Hervé Ballans <herve.ballans@xxxxxxxxxxxxx> wrote: > Le 23/08/2018 à 15:20, Alfredo Deza a écrit : > > Thanks Alfredo for your reply. I'm using the very last version of Luminous > (12.2.7) and ceph-deploy (2.0.1). > I have no problem in creating my OSD, that's work perfectly. > My issue only concerns the problem of the mount names of the NVMe partitions > which change after a reboot when there are more than one NVMe device on the > OSD node. > > ceph-volume is pretty resilient to partition changes because it stores > the PARTUUID of the partition in LVM, and it queries > it each time at boot. Note that for bluestore there is no mounting > whatsoever. Have you created partitions with a PARTUUID on the nvme > devices for block.db ? > > > Here is how I created my BlueStore OSDs (in the first OSD node) : > > 1) On the OSD node node-osd0, I first created block partitions on the NVMe > device (PM1725a 800GB), like this : > > # parted /dev/nvme0n1 mklabel gpt > > # echo "1 0 10 > 2 10 20 > 3 20 30 > 4 30 40 > 5 40 50 > 6 50 60 > 7 60 70 > 8 70 80 > 9 80 90 > 10 90 100" | while read num beg end; do parted /dev/nvme0n1 mkpart $num > $beg% $end%; done > > Extract of cat /proc/partitions : > > 259 2 781412184 nvme1n1 > 259 3 781412184 nvme0n1 > 259 5 78140416 nvme0n1p1 > 259 6 78141440 nvme0n1p2 > 259 7 78140416 nvme0n1p3 > 259 8 78141440 nvme0n1p4 > 259 9 78141440 nvme0n1p5 > 259 10 78141440 nvme0n1p6 > 259 11 78140416 nvme0n1p7 > 259 12 78141440 nvme0n1p8 > 259 13 78141440 nvme0n1p9 > 259 15 78140416 nvme0n1p10 > > 2) Then, from the admin node, I created my 10 first OSDs like this : > > echo "/dev/sda /dev/nvme0n1p1 > /dev/sdb /dev/nvme0n1p2 > /dev/sdc /dev/nvme0n1p3 > /dev/sdd /dev/nvme0n1p4 > /dev/sde /dev/nvme0n1p5 > /dev/sdf /dev/nvme0n1p6 > /dev/sdg /dev/nvme0n1p7 > /dev/sdh /dev/nvme0n1p8 > /dev/sdi /dev/nvme0n1p9 > /dev/sdj /dev/nvme0n1p10" | while read hdd db; do ceph-deploy osd create > --debug --bluestore --data $hdd --block-db $db node-osd0; done > > What you mean is that, at this stage, I must directly declare the UUID paths > in value of --block.db (i.e. replace /dev/nvme0n1p1 with its PARTUUID), that > is ? No, this all looks correct. How does the ceph-volume.log and ceph-volume-systemd.log look when you are booting up for the OSDs that aren't coming up? Anything useful in there? > > Currently, I created 60 OSDs like that. The ceph cluster is HEALTH_OK and > all osds are up and in. But I'm not yet in prodcution and there is only test > data on it, so I can destroy everything and rebuild my OSDs. > That's what you advise me to do there, taking care to specify the PARTUUID > for the block.db instead of the device names ? > > > For instance, if I have two NVMe devices, the first time, the first device > is mounted with name /dev/nvme0n1 and the second device with name > /dev/nvme1n1. After node restart, these names can be reversed, that is, the > first device named /dev/nvme1n1 and the second one /dev/nvme0n1 ! The result > is that OSDs no longer find their metadata and do not start up... > > This sounds very odd. Could you clarify where block and block.db are? > Also useful here would be to take a look at > /var/log/ceph/ceph-volume-systemd.log and ceph-volume.log to > see how ceph-volume is trying to get this OSD up and running. > > Also useful would be to check `ceph-volume lvm list` to verify that > regardless of the name change, it recognizes the correct partition > mapped to the OSD > > Oops ! > > # ceph-volume lvm list > --> KeyError: 'devices' Can you re-run this like: CEPH_VOLUME_DEBUG=1 ceph-volume lvm list And paste the output? I think this has been fixed since, but want to double check > > Thank you again, > Hervé _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com