Re: Why lvm is recommended method for bleustore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 23, 2018 at 6:09 AM, Nicolas Huillard <nhuillard@xxxxxxxxxxx> wrote:
> Le dimanche 22 juillet 2018 à 09:51 -0400, Satish Patel a écrit :
>> I read that post and that's why I open this thread for few more
>> questions and clearence,
>>
>> When you said OSD doesn't come up what actually that means?  After
>> reboot of node or after service restart or installation of new disk?
>>
>> You said we are using manual method what is that?
>>
>> I'm building new cluster and had zero prior experience so how can I
>> produce this error to see lvm is really life saving tool here? I'm
>> sure there are plenty of people using but I didn't find and good
>> document except that mailing list which raising more questions in my
>> mind.
>
> When I had to change a few drives manually, copying the old contents
> over, I noticed that the logical volumes are tagged with lots of
> information related to how they should be handled at boot time by the
> OSD startup system.
> These LVM tags are a good standard way to add that meta-data within the
> volumes themselves. Apparently, there is no other way to add these tags
> that allow for bluestore/filestore, SATA/SAS/NVMe, whole drive or
> partition, etc.
> They are easy to manage and fail-safe in many configurations.

This is spot on. To clarify even further, let me give a brief overview
of how that worked with ceph-disk and GPT GUID:

* at creation time, ceph-disk would add a GUID to the partitions so
that it would later be recognized. These GUID were unique so they
would ensure accuracy
* a set of udev rules would be in place to detect when these GUID
would become available in the system
* at boot time, udev would start detecting devices coming online, and
the rules would call out to ceph-disk (the executable)
* the ceph-disk executable would then call out to the ceph-disk
systemd unit, with a timeout of three hours the device to which it was
assigned (e.g. ceph-disk@/dev/sda )
* the previous step would be done *per device*, waiting for all
devices associated with the OSD to become available (hence the 3 hour
timeout)
* the ceph-disk systemd unit would call back again to the ceph-disk
command line tool signaling devices are ready (with --sync)
* the ceph-disk command line tool would call *the ceph-disk command
line tool again* to "activate" the OSD, having detected (finally) the
device type (encrypted, partially prepared, etc...)

The above workflow worked for pre-systemd systems, it could've
probably be streamlined better, but it was what allowed to "discover"
devices at boot time. The 3 hour timeout was there because
udev would find these devices being active asynchronously, and
ceph-disk was trying to coerce a more synchronous behavior to get all
devices needed. In a dense OSD node, this meant that OSDs
would not come up at all, inconsistently (sometimes all of them would work!).

Device discovery is a tremendously complicated and difficult problem
to solve, and we thought that a few simple rules with UDEV would be
the answer (they weren't). The LVM implementation of ceph-volume
limits itself to just ask LVM about devices and then gets them
"activated" at once. On some tests on nodes with ~20 OSDs, we were 10x
faster to come up (compared to ceph-disk), and fully operational -
every time.

Since this is a question that keeps coming up, and answers are now
getting a bit scattered, I'll compound them all into a section in the
docs. I'll try to address the "layer of complexity", "performance
overhead", and other
recurring issues that keep being used.

Any other ideas are welcomed if some of the previously discussed
things are still not entirely clear.

>
>> Sent from my iPhone
>>
>> > On Jul 22, 2018, at 6:31 AM, Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>
>> > wrote:
>> >
>> >
>> >
>> > I don’t think it will get any more basic than that. Or maybe this?
>> > If
>> > the doctor diagnoses you, you can either accept this, get 2nd
>> > opinion,
>> > or study medicine to verify it.
>> >
>> > In short lvm has been introduced to solve some issues of related
>> > to
>> > starting osd's (which I did not have, probably because of a
>> > 'manual'
>> > configuration). And it opens the ability to support (more future)
>> > devices.
>> >
>> > I gave you two links, did you read the whole thread?
>> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg47802.htm
>> > l
>> >
>> >
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: Satish Patel [mailto:satish.txt@xxxxxxxxx]
>> > Sent: zaterdag 21 juli 2018 20:59
>> > To: ceph-users
>> > Subject:  Why lvm is recommended method for bleustore
>> >
>> > Folks,
>> >
>> > I think i am going to boil ocean here, I google a lot about this
>> > topic
>> > why lvm is recommended method for bluestore, but didn't find any
>> > good
>> > and detail explanation, not even in Ceph official website.
>> >
>> > Can someone explain here in basic language because i am no way
>> > expert so
>> > just want to understand what is the advantage of adding extra layer
>> > of
>> > complexity?
>> >
>> > I found this post but its not i got lost reading it and want to see
>> > what
>> > other folks suggesting and offering in their language
>> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg46768.htm
>> > l
>> >
>> > ~S
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> --
> Nicolas Huillard
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux