Hello list.
Today while redeploying an OSD I've noticed that links to DB/WAL devices are pointing to partitions themselves, not to the partition UUID how it was before.
I think that changed with latest ceph-deploy.
I'm using 12.2.2 on my mon/osd nodes.
ceph-deploy is 2.0.1 on admin node.
All nodes use Ubuntu 16.04.
Here's what I'm talking about.
Consider those two OSDs on one node. osd.4 is an "old" OSD, while osd.6 is "new" osd.
$ df
...
tmpfs 7.9G 24K 7.9G 1% /var/lib/ceph/osd/ceph-6
/dev/sda1 94M 5.4M 89M 6% /var/lib/ceph/osd/ceph-4
$ ll /var/lib/ceph/osd/ceph-4
...
lrwxrwxrwx 1 ceph ceph 58 Feb 20 2018 block -> /dev/disk/by-partuuid/21d7a19b-b520-4fa9-b291-2cd4a215b67b
lrwxrwxrwx 1 ceph ceph 58 Feb 20 2018 block.db -> /dev/disk/by-partuuid/ad56b072-28dd-4b27-85f2-06067546a0f2
$ ll /var/lib/ceph/osd/ceph-6
...
lrwxrwxrwx 1 root root 9 Nov 7 14:16 block.db -> /dev/sdb3
lrwxrwxrwx 1 root root 9 Nov 7 14:16 block.wal -> /dev/sdb4
I'm using external SSD for DB/WAL, and now newly created OSDs point to partition names.
Those two ceph-deploy commands have no difference, and lead to OSD created with links to partitions, not UUIDs:
$ ceph-deploy osd create --bluestore --data /dev/sdc --block-db /dev/sdb1 --block-wal /dev/sdb2 ceph-osd2
$ ceph-deploy osd create --bluestore --data /dev/sdc --block-db /dev/disk/by-partuuid/e703a35b-de8d-46e0-9fb2-4a4fd0b49c91 --block-wal /dev/disk/by-partuuid/d2db7017-4d57-45f8-95ff-0131b3b2de7d ceph-osd2
resulting OSD will have partition links:
$ ll /var/lib/ceph/osd/ceph-0
...
lrwxrwxrwx 1 root root 9 Jan 21 16:46 block.db -> /dev/sdb1
lrwxrwxrwx 1 root root 9 Jan 21 16:46 block.wal -> /dev/sdb2
I first saw this change when updated ceph-deploy, but didn't pay attention back then, but today it hit me - if for somewhat reason my DB/WAL SSD changes its name on node reboot for example, my OSDs will not find their DB/WAL devices, and will not start. This will require manual intervention at least, and possibly some worse scenarios if you use more than 1 DB/WAL device on a single OSD node.
Is there something wrong with my OSD deployment scheme? Why link naming logic was changed from UUIDs to straight partitions? Or maybe I'm imagining this threat and CEPH can compesate in such a case?
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com