Re: Why lvm is recommended method for bleustore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 23, 2018 at 1:56 PM, Satish Patel <satish.txt@xxxxxxxxx> wrote:
> This is great explanation, based on your details look like when reboot
> machine (OSD node) it will take longer time to initialize all number
> of OSDs but if we use LVM in that case it shorten that time.

That is one aspect, yes. Most importantly: all OSDs will consistently
come up with ceph-volume. This wasn't the case with ceph-disk and it
was impossible to
replicate or understand why (hence the 3 hour timeout)

>
> There is a good chance that LVM impact some performance because of
> extra layer, Does anyone has any data which can provide some inside
> about good or bad performance. It would be great if your share so it
> will help us to understand impact.

There isn't performance impact, and if there is, it is negligible.

>
>
>
> On Mon, Jul 23, 2018 at 8:37 AM, Alfredo Deza <adeza@xxxxxxxxxx> wrote:
>> On Mon, Jul 23, 2018 at 6:09 AM, Nicolas Huillard <nhuillard@xxxxxxxxxxx> wrote:
>>> Le dimanche 22 juillet 2018 à 09:51 -0400, Satish Patel a écrit :
>>>> I read that post and that's why I open this thread for few more
>>>> questions and clearence,
>>>>
>>>> When you said OSD doesn't come up what actually that means?  After
>>>> reboot of node or after service restart or installation of new disk?
>>>>
>>>> You said we are using manual method what is that?
>>>>
>>>> I'm building new cluster and had zero prior experience so how can I
>>>> produce this error to see lvm is really life saving tool here? I'm
>>>> sure there are plenty of people using but I didn't find and good
>>>> document except that mailing list which raising more questions in my
>>>> mind.
>>>
>>> When I had to change a few drives manually, copying the old contents
>>> over, I noticed that the logical volumes are tagged with lots of
>>> information related to how they should be handled at boot time by the
>>> OSD startup system.
>>> These LVM tags are a good standard way to add that meta-data within the
>>> volumes themselves. Apparently, there is no other way to add these tags
>>> that allow for bluestore/filestore, SATA/SAS/NVMe, whole drive or
>>> partition, etc.
>>> They are easy to manage and fail-safe in many configurations.
>>
>> This is spot on. To clarify even further, let me give a brief overview
>> of how that worked with ceph-disk and GPT GUID:
>>
>> * at creation time, ceph-disk would add a GUID to the partitions so
>> that it would later be recognized. These GUID were unique so they
>> would ensure accuracy
>> * a set of udev rules would be in place to detect when these GUID
>> would become available in the system
>> * at boot time, udev would start detecting devices coming online, and
>> the rules would call out to ceph-disk (the executable)
>> * the ceph-disk executable would then call out to the ceph-disk
>> systemd unit, with a timeout of three hours the device to which it was
>> assigned (e.g. ceph-disk@/dev/sda )
>> * the previous step would be done *per device*, waiting for all
>> devices associated with the OSD to become available (hence the 3 hour
>> timeout)
>> * the ceph-disk systemd unit would call back again to the ceph-disk
>> command line tool signaling devices are ready (with --sync)
>> * the ceph-disk command line tool would call *the ceph-disk command
>> line tool again* to "activate" the OSD, having detected (finally) the
>> device type (encrypted, partially prepared, etc...)
>>
>> The above workflow worked for pre-systemd systems, it could've
>> probably be streamlined better, but it was what allowed to "discover"
>> devices at boot time. The 3 hour timeout was there because
>> udev would find these devices being active asynchronously, and
>> ceph-disk was trying to coerce a more synchronous behavior to get all
>> devices needed. In a dense OSD node, this meant that OSDs
>> would not come up at all, inconsistently (sometimes all of them would work!).
>>
>> Device discovery is a tremendously complicated and difficult problem
>> to solve, and we thought that a few simple rules with UDEV would be
>> the answer (they weren't). The LVM implementation of ceph-volume
>> limits itself to just ask LVM about devices and then gets them
>> "activated" at once. On some tests on nodes with ~20 OSDs, we were 10x
>> faster to come up (compared to ceph-disk), and fully operational -
>> every time.
>>
>> Since this is a question that keeps coming up, and answers are now
>> getting a bit scattered, I'll compound them all into a section in the
>> docs. I'll try to address the "layer of complexity", "performance
>> overhead", and other
>> recurring issues that keep being used.
>>
>> Any other ideas are welcomed if some of the previously discussed
>> things are still not entirely clear.
>>
>>>
>>>> Sent from my iPhone
>>>>
>>>> > On Jul 22, 2018, at 6:31 AM, Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>
>>>> > wrote:
>>>> >
>>>> >
>>>> >
>>>> > I don’t think it will get any more basic than that. Or maybe this?
>>>> > If
>>>> > the doctor diagnoses you, you can either accept this, get 2nd
>>>> > opinion,
>>>> > or study medicine to verify it.
>>>> >
>>>> > In short lvm has been introduced to solve some issues of related
>>>> > to
>>>> > starting osd's (which I did not have, probably because of a
>>>> > 'manual'
>>>> > configuration). And it opens the ability to support (more future)
>>>> > devices.
>>>> >
>>>> > I gave you two links, did you read the whole thread?
>>>> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg47802.htm
>>>> > l
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > -----Original Message-----
>>>> > From: Satish Patel [mailto:satish.txt@xxxxxxxxx]
>>>> > Sent: zaterdag 21 juli 2018 20:59
>>>> > To: ceph-users
>>>> > Subject:  Why lvm is recommended method for bleustore
>>>> >
>>>> > Folks,
>>>> >
>>>> > I think i am going to boil ocean here, I google a lot about this
>>>> > topic
>>>> > why lvm is recommended method for bluestore, but didn't find any
>>>> > good
>>>> > and detail explanation, not even in Ceph official website.
>>>> >
>>>> > Can someone explain here in basic language because i am no way
>>>> > expert so
>>>> > just want to understand what is the advantage of adding extra layer
>>>> > of
>>>> > complexity?
>>>> >
>>>> > I found this post but its not i got lost reading it and want to see
>>>> > what
>>>> > other folks suggesting and offering in their language
>>>> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg46768.htm
>>>> > l
>>>> >
>>>> > ~S
>>>> > _______________________________________________
>>>> > ceph-users mailing list
>>>> > ceph-users@xxxxxxxxxxxxxx
>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> >
>>>> >
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> --
>>> Nicolas Huillard
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux