Re: Shared WAL/DB device partition for multiple OSDs?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le 23/08/2018 à 15:20, Alfredo Deza a écrit :
Thanks Alfredo for your reply. I'm using the very last version of Luminous
(12.2.7) and ceph-deploy (2.0.1).
I have no problem in creating my OSD, that's work perfectly.
My issue only concerns the problem of the mount names of the NVMe partitions
which change after a reboot when there are more than one NVMe device on the
OSD node.
ceph-volume is pretty resilient to partition changes because it stores
the PARTUUID of the partition in LVM, and it queries
it each time at boot. Note that for bluestore there is no mounting
whatsoever. Have you created partitions with a PARTUUID on the nvme
devices for block.db ?

Here is how I created my BlueStore OSDs (in the first OSD node) :
 
1) On the OSD node node-osd0, I first created block partitions on the NVMe device (PM1725a 800GB), like this :

# parted /dev/nvme0n1 mklabel gpt

# echo "1 0 10
2 10 20
3 20 30
4 30 40
5 40 50
6 50 60
7 60 70
8 70 80
9 80 90
10 90 100" | while read num beg end; do parted /dev/nvme0n1 mkpart $num $beg% $end%; done


Extract of cat /proc/partitions :

 259        2  781412184 nvme1n1
 259        3  781412184 nvme0n1
 259        5   78140416 nvme0n1p1
 259        6   78141440 nvme0n1p2
 259        7   78140416 nvme0n1p3
 259        8   78141440 nvme0n1p4
 259        9   78141440 nvme0n1p5
 259       10   78141440 nvme0n1p6
 259       11   78140416 nvme0n1p7
 259       12   78141440 nvme0n1p8
 259       13   78141440 nvme0n1p9
 259       15   78140416 nvme0n1p10


2) Then, from the admin node, I created my 10 first OSDs like this :

echo "/dev/sda /dev/nvme0n1p1
/dev/sdb /dev/nvme0n1p2
/dev/sdc /dev/nvme0n1p3
/dev/sdd /dev/nvme0n1p4
/dev/sde /dev/nvme0n1p5
/dev/sdf /dev/nvme0n1p6
/dev/sdg /dev/nvme0n1p7
/dev/sdh /dev/nvme0n1p8
/dev/sdi /dev/nvme0n1p9
/dev/sdj /dev/nvme0n1p10" | while read hdd db; do ceph-deploy osd create --debug --bluestore --data $hdd --block-db $db node-osd0; done


What you mean is that, at this stage, I must directly declare the UUID paths in value of --block.db (i.e. replace /dev/nvme0n1p1 with its PARTUUID), that is ?

Currently, I created 60 OSDs like that. The ceph cluster is HEALTH_OK and all osds are up and in. But I'm not yet in prodcution and there is only test data on it, so I can destroy everything and rebuild my OSDs.
That's what you advise me to do there, taking care to specify the PARTUUID for the block.db instead of the device names ?


For instance, if I have two NVMe devices, the first time, the first device
is mounted with name /dev/nvme0n1 and the second device with name
/dev/nvme1n1. After node restart, these names can be reversed, that is, the
first device named /dev/nvme1n1 and the second one /dev/nvme0n1 ! The result
is that OSDs no longer find their metadata and do not start up...
This sounds very odd. Could you clarify where block and block.db are?
Also useful here would be to take a look at
/var/log/ceph/ceph-volume-systemd.log and ceph-volume.log to
see how ceph-volume is trying to get this OSD up and running.

Also useful would be to check `ceph-volume lvm list` to verify that
regardless of the name change, it recognizes the correct partition
mapped to the OSD

Oops !

# ceph-volume lvm list
-->  KeyError: 'devices'

Thank you again,
Hervé

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux