Re: Why lvm is recommended method for bleustore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is great explanation, based on your details look like when reboot
machine (OSD node) it will take longer time to initialize all number
of OSDs but if we use LVM in that case it shorten that time.

There is a good chance that LVM impact some performance because of
extra layer, Does anyone has any data which can provide some inside
about good or bad performance. It would be great if your share so it
will help us to understand impact.



On Mon, Jul 23, 2018 at 8:37 AM, Alfredo Deza <adeza@xxxxxxxxxx> wrote:
> On Mon, Jul 23, 2018 at 6:09 AM, Nicolas Huillard <nhuillard@xxxxxxxxxxx> wrote:
>> Le dimanche 22 juillet 2018 à 09:51 -0400, Satish Patel a écrit :
>>> I read that post and that's why I open this thread for few more
>>> questions and clearence,
>>>
>>> When you said OSD doesn't come up what actually that means?  After
>>> reboot of node or after service restart or installation of new disk?
>>>
>>> You said we are using manual method what is that?
>>>
>>> I'm building new cluster and had zero prior experience so how can I
>>> produce this error to see lvm is really life saving tool here? I'm
>>> sure there are plenty of people using but I didn't find and good
>>> document except that mailing list which raising more questions in my
>>> mind.
>>
>> When I had to change a few drives manually, copying the old contents
>> over, I noticed that the logical volumes are tagged with lots of
>> information related to how they should be handled at boot time by the
>> OSD startup system.
>> These LVM tags are a good standard way to add that meta-data within the
>> volumes themselves. Apparently, there is no other way to add these tags
>> that allow for bluestore/filestore, SATA/SAS/NVMe, whole drive or
>> partition, etc.
>> They are easy to manage and fail-safe in many configurations.
>
> This is spot on. To clarify even further, let me give a brief overview
> of how that worked with ceph-disk and GPT GUID:
>
> * at creation time, ceph-disk would add a GUID to the partitions so
> that it would later be recognized. These GUID were unique so they
> would ensure accuracy
> * a set of udev rules would be in place to detect when these GUID
> would become available in the system
> * at boot time, udev would start detecting devices coming online, and
> the rules would call out to ceph-disk (the executable)
> * the ceph-disk executable would then call out to the ceph-disk
> systemd unit, with a timeout of three hours the device to which it was
> assigned (e.g. ceph-disk@/dev/sda )
> * the previous step would be done *per device*, waiting for all
> devices associated with the OSD to become available (hence the 3 hour
> timeout)
> * the ceph-disk systemd unit would call back again to the ceph-disk
> command line tool signaling devices are ready (with --sync)
> * the ceph-disk command line tool would call *the ceph-disk command
> line tool again* to "activate" the OSD, having detected (finally) the
> device type (encrypted, partially prepared, etc...)
>
> The above workflow worked for pre-systemd systems, it could've
> probably be streamlined better, but it was what allowed to "discover"
> devices at boot time. The 3 hour timeout was there because
> udev would find these devices being active asynchronously, and
> ceph-disk was trying to coerce a more synchronous behavior to get all
> devices needed. In a dense OSD node, this meant that OSDs
> would not come up at all, inconsistently (sometimes all of them would work!).
>
> Device discovery is a tremendously complicated and difficult problem
> to solve, and we thought that a few simple rules with UDEV would be
> the answer (they weren't). The LVM implementation of ceph-volume
> limits itself to just ask LVM about devices and then gets them
> "activated" at once. On some tests on nodes with ~20 OSDs, we were 10x
> faster to come up (compared to ceph-disk), and fully operational -
> every time.
>
> Since this is a question that keeps coming up, and answers are now
> getting a bit scattered, I'll compound them all into a section in the
> docs. I'll try to address the "layer of complexity", "performance
> overhead", and other
> recurring issues that keep being used.
>
> Any other ideas are welcomed if some of the previously discussed
> things are still not entirely clear.
>
>>
>>> Sent from my iPhone
>>>
>>> > On Jul 22, 2018, at 6:31 AM, Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>
>>> > wrote:
>>> >
>>> >
>>> >
>>> > I don’t think it will get any more basic than that. Or maybe this?
>>> > If
>>> > the doctor diagnoses you, you can either accept this, get 2nd
>>> > opinion,
>>> > or study medicine to verify it.
>>> >
>>> > In short lvm has been introduced to solve some issues of related
>>> > to
>>> > starting osd's (which I did not have, probably because of a
>>> > 'manual'
>>> > configuration). And it opens the ability to support (more future)
>>> > devices.
>>> >
>>> > I gave you two links, did you read the whole thread?
>>> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg47802.htm
>>> > l
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > -----Original Message-----
>>> > From: Satish Patel [mailto:satish.txt@xxxxxxxxx]
>>> > Sent: zaterdag 21 juli 2018 20:59
>>> > To: ceph-users
>>> > Subject:  Why lvm is recommended method for bleustore
>>> >
>>> > Folks,
>>> >
>>> > I think i am going to boil ocean here, I google a lot about this
>>> > topic
>>> > why lvm is recommended method for bluestore, but didn't find any
>>> > good
>>> > and detail explanation, not even in Ceph official website.
>>> >
>>> > Can someone explain here in basic language because i am no way
>>> > expert so
>>> > just want to understand what is the advantage of adding extra layer
>>> > of
>>> > complexity?
>>> >
>>> > I found this post but its not i got lost reading it and want to see
>>> > what
>>> > other folks suggesting and offering in their language
>>> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg46768.htm
>>> > l
>>> >
>>> > ~S
>>> > _______________________________________________
>>> > ceph-users mailing list
>>> > ceph-users@xxxxxxxxxxxxxx
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >
>>> >
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> --
>> Nicolas Huillard
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux