Re: migrate wal/db to block device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Igor,

The immediate answer is to use "ceph-volume lvm zap" on the db LV after running the migrate. But for the longer term I think the "lvm zap" should be included in the "lvm migrate" process.

I.e. this works to migrate a separate wal/db to the block device:

#
# WARNING! DO NOT ZAP AFTER STARTING THE OSD!!
#
$ cephadm ceph-volume lvm list "${osd}" > ~/"osd.${osd}.list"
$ systemctl stop "${osd_service}"
$ cephadm shell --fsid "${fsid}" --name "osd.${osd}" -- \
  ceph-volume lvm migrate --osd-id "${osd}" --osd-fsid "${osd_fsid}" \
  --from db wal --target "${vg_lv}"
$ cephadm shell --fsid "${fsid}" --name "osd.${osd}" -- \
  ceph-volume lvm zap "${db_lv}"
$ systemctl start "${osd_service}"

WARNING! If you don't do the zap before starting the osd, the osd will be running with the db still on the LV. If you then stop the osd and zap the LV and start the osd again, you'll be running on the original db as it was copied to the block device before the migrate, which will be missing any updates done in the meantime. I don't know what problems that might cause. In this situation I've restored the LV tags (i.e. all tags on the db LV, the db_device and db_uuid tags on the block LV) using the info from ~/osd.${osd}.list (otherwise the migrate fails!) and then gone through the migrate process again.

The problem is, it turns out the osd is being activated as a "raw" device rather than an "lvm" device, and the "raw" db device (which is actually an lvm LV) still has a bluestore label on it after the migrate, so it's still seen as a component of the osd.

E.g. before the migrate, both of these show the osd with the separate db:

$ cephadm ceph-volume lvm list
$ cephadm ceph-volume raw list

After the migrate (without zap), the "lvm list" does NOT show the separate db (because the appropriate LV tags have been removed), but the "raw list" still shows the osd with the separate db.

And the osd is being activated as a "raw" device, both before and after the migrate. E.g. extract from the journal before the migrate:

Nov 15 22:39:05 k12 bash[3829222]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-25
Nov 15 22:39:05 k12 bash[3829222]: Running command: /usr/bin/ceph-bluestore-tool prime-osd-dir --path /var/lib/ceph/osd/ceph-25 --no-mon-config --dev /dev/mapper/ceph--5ccbb386--142b--4bf7--
Nov 15 22:39:05 k12 bash[3829222]: Running command: /usr/bin/chown -h ceph:ceph /dev/mapper/ceph--5ccbb386--142b--4bf7--a180--04bcf9a1f61b-osd--block--7710024b--ec71--4fd3--b94c--c4c4b9af2d2
Nov 15 22:39:05 k12 bash[3829222]: Running command: /usr/bin/chown -R ceph:ceph /dev/dm-1
Nov 15 22:39:05 k12 bash[3829222]: Running command: /usr/bin/ln -s /dev/mapper/ceph--5ccbb386--142b--4bf7--a180--04bcf9a1f61b-osd--block--7710024b--ec71--4fd3--b94c--c4c4b9af2d21 /var/lib/ce
Nov 15 22:39:05 k12 bash[3829222]: Running command: /usr/bin/chown -h ceph:ceph /dev/mapper/ceph--d4b1e932--4557--4b88--bed2--9305a07e76eb-osd--db--6a507f57--884c--4947--a147--cd50f98f1a23
Nov 15 22:39:05 k12 bash[3829222]: Running command: /usr/bin/chown -R ceph:ceph /dev/dm-2
Nov 15 22:39:05 k12 bash[3829222]: Running command: /usr/bin/ln -s /dev/mapper/ceph--d4b1e932--4557--4b88--bed2--9305a07e76eb-osd--db--6a507f57--884c--4947--a147--cd50f98f1a23 /var/lib/ceph/
Nov 15 22:39:05 k12 bash[3829222]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-25
Nov 15 22:39:05 k12 bash[3829222]: --> ceph-volume raw activate successful for osd ID: 25

After a migrate without a zap - note there are still two mapper/lv devices found, which includes the now-unwanted db LV:

Nov 16 09:08:31 k12 bash[4012506]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-25
Nov 16 09:08:31 k12 bash[4012506]: Running command: /usr/bin/ceph-bluestore-tool prime-osd-dir --path /var/lib/ceph/osd/ceph-25 --no-mon-config --dev /dev/mapper/ceph--5ccbb386--142b--4bf7--
Nov 16 09:08:31 k12 bash[4012506]: Running command: /usr/bin/chown -h ceph:ceph /dev/mapper/ceph--5ccbb386--142b--4bf7--a180--04bcf9a1f61b-osd--block--7710024b--ec71--4fd3--b94c--c4c4b9af2d2
Nov 16 09:08:31 k12 bash[4012506]: Running command: /usr/bin/chown -R ceph:ceph /dev/dm-1
Nov 16 09:08:31 k12 bash[4012506]: Running command: /usr/bin/ln -s /dev/mapper/ceph--5ccbb386--142b--4bf7--a180--04bcf9a1f61b-osd--block--7710024b--ec71--4fd3--b94c--c4c4b9af2d21 /var/lib/ce
Nov 16 09:08:31 k12 bash[4012506]: Running command: /usr/bin/chown -h ceph:ceph /dev/mapper/ceph--d4b1e932--4557--4b88--bed2--9305a07e76eb-osd--db--6a507f57--884c--4947--a147--cd50f98f1a23
Nov 16 09:08:31 k12 bash[4012506]: Running command: /usr/bin/chown -R ceph:ceph /dev/dm-2
Nov 16 09:08:31 k12 bash[4012506]: Running command: /usr/bin/ln -s /dev/mapper/ceph--d4b1e932--4557--4b88--bed2--9305a07e76eb-osd--db--6a507f57--884c--4947--a147--cd50f98f1a23 /var/lib/ceph/
Nov 16 09:08:31 k12 bash[4012506]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-25
Nov 16 09:08:31 k12 bash[4012506]: --> ceph-volume raw activate successful for osd ID: 25

After a migrate and zap - note there's now only a single mapper/lv device found, i.e. we've successfully stopped using the separate db device:

Nov 16 12:33:39 k12 bash[4091471]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-25
Nov 16 12:33:39 k12 bash[4091471]: Running command: /usr/bin/ceph-bluestore-tool prime-osd-dir --path /var/lib/ceph/osd/ceph-25 --no-mon-config --dev /dev/mapper/ceph--5ccbb386--142b--4bf7--
Nov 16 12:33:39 k12 bash[4091471]: Running command: /usr/bin/chown -h ceph:ceph /dev/mapper/ceph--5ccbb386--142b--4bf7--a180--04bcf9a1f61b-osd--block--7710024b--ec71--4fd3--b94c--c4c4b9af2d2
Nov 16 12:33:39 k12 bash[4091471]: Running command: /usr/bin/chown -R ceph:ceph /dev/dm-1
Nov 16 12:33:39 k12 bash[4091471]: Running command: /usr/bin/ln -s /dev/mapper/ceph--5ccbb386--142b--4bf7--a180--04bcf9a1f61b-osd--block--7710024b--ec71--4fd3--b94c--c4c4b9af2d21 /var/lib/ce
Nov 16 12:33:39 k12 bash[4091471]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-25
Nov 16 12:33:39 k12 bash[4091471]: --> ceph-volume raw activate successful for osd ID: 25

Wrapping up...

I think the "lvm zap" should be included in the "ceph-volume lvm migrate" process, and perhaps "ceph-volume activate" changed to NOT detect LVs as raw devices so they're correctly activated as "lvm" devices.

Another oddity that unfortunately extended the time taken to analyse this issue... why does "ceph-volume raw list ${osd}" NOT show lvm osds, when plain "ceph-volume raw list" shows them?


Cheers,

Chris
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux