Re: ceph-volume lvm tag ceph.data_device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16-10-2018 13:58, Jan Fajerski wrote:
On Tue, Oct 16, 2018 at 01:10:02PM +0200, Willem Jan Withagen wrote:
On 16/10/2018 12:02, Jan Fajerski wrote:
On Mon, Oct 15, 2018 at 06:56:09AM -0500, Alfredo Deza wrote:
On Mon, Oct 15, 2018 at 6:48 AM Jan Fajerski <jfajerski@xxxxxxxx> wrote:

Hi list,
while playing with ceph-volume I noticed that it adds the tag ceph.data_device
to an lv with the name of the lv (at the time of calling prepare).
I was wondering what this specific tag is used for. From looking at
ceph-volume's code it seems its only ever set.
Using vgrename of lvrename one can easily create an inconsistency in this self-reference. Restarting the OSD (or rebooting the node) still works as
expected but I'm certainly not thinking of all cases here.

The tags are used as a key/value store in the device, and we try to
add as much info there as possible. I think you are right that
we only set it (for now), but I can see how this could get us into
trouble if we ever depended on it.

A similar issue happens with the ephemeral names of other non-lv
devices, in which case we do update them.

If this doesn't serve a specific purpose I think we shouldn't set the tag (happy
to push a PR).

I think the right thing to do would be to make sure that we have the
right LV and update it if that changes. This would help commands like
`ceph-volume lvm list` which
displays that information.

Would it make sense to change the implementation to simply return the lv name on the fly instead of duplicating the information in an lvm tag and trying to keep it consistent?

As starter:
I would consider it ill adviced to start changing these kinds of nameing in the underlaying storage name....
Just because you can, it is not a reason to do so.
If its possible, someone will do it. And I'm wondering if we can make it so that this kind of operation can be done without any issue...whats wrong with that?

That is why I have my BOFH hat on...

On a complex system as Ceph is, you just don't go around and change things without a very good reason. AND you know what you are doing.
It is just plain footshooting.

If you can rename a lv/vg, you can also mess up other things. And I'm more with Alfredo that for the moment it perhaps does not serve a purpose. But it might be just to restore the location after you just renamed things.

You can also throw away pgs and or shards that are reported corrupt?
Probably is not a smart thing to do.

This is waht I register wiht ZFS:
osd.10/osd ceph:cluster_fsid ef485af8-9c2b-11e8-a98b-0025903744dc osd.10/osd ceph:cluster_name ceph osd.10/osd ceph:data_device osd.10 local osd.10/osd ceph:osd_fsid 5f490ea4-96b0-4c9e-ae92-5f8893cf6a60 osd.10/osd ceph:crush_device_class None osd.10/osd ceph:osd_id 10 osd.10/osd ceph:type data

Now I could start moving the data around, but then I do not have some the device information I cannot destroy the object easily and have to start toying with zpool/zfs/gpart/geom to reconstruct the whole dependancy chain. And note that ZFS makes it possible to replace a disk, without Ceph ever detecting it. Just attach a disk in a mirror, scrub the vdev, and then remove the old disk from the mirror. And that will get you a new fresh copy of the old disk.

And I do not know enough of LVM, but could this information be used to restore the correct lv/vg nameing after a serious loss of info about the LVM layout?

And if you go around "just renaming" you should be knowledgeable to understand that attributes need to be changed as well. "It just works" is something I would consider a poor argument for this case.
I'd argue the other way around. Why are we duplicating information (that easily becomes inconsistent) that we then don't use. I'm arguing for not writing this lvm tag, because we can just get that info on the fly for e.g. 'ceph-volume lvm list'. We can identify lv's that are used by ceph and just output that lv's name instead of relying on the tag content.

But as soon as you start moving out of the layout that was carefully planned by ceph-volume all gloves are off.

In the ZFS case `data_device` is the linking pin between the partition data and the fysical disk. Which you would need to update if you would relabel a partition in gpart. If you don't, zapping a disk becomes a dangerous manual process.

And I can imagine something like this on LVM as well.

--WjW

--WjW



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux