Re: Why lvm is recommended method for bleustore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 23, 2018 at 2:33 PM, Satish Patel <satish.txt@xxxxxxxxx> wrote:
> Alfredo,
>
> Thanks, I think i should go with LVM then :)
>
> I have question here, I have 4 physical SSD per server, some reason i
> am using ceph-ansible 3.0.8 version which doesn't create LVM volume
> itself so i have to create LVM volume manually.
>
> I am using bluestore  ( want to keep WAL/DB on same DATA disk), How do
> i create lvm manually on single physical disk? Do i need to create two
> logical volume (1 for journal & 1 for Data )?
>
> I am reading this
> http://docs.ceph.com/ceph-ansible/master/osds/scenarios.html (at
> bottom)
>
> lvm_volumes:
>   - data: data-lv1
>     data_vg: vg1
>     crush_device_class: foo

For a raw device (e.g. /dev/sda) you can do:

lvm_volumes:
  - data: /dev/sda

The LV gets created for you in this one case

>
>
> In above example, did they create vg1 (volume group)  and created
> data-lv1 (logical volume)? If i want to add journal then do i need to
> create one more logical volume?  I am confused in that document so
> need some clarification
>
> On Mon, Jul 23, 2018 at 2:06 PM, Alfredo Deza <adeza@xxxxxxxxxx> wrote:
>> On Mon, Jul 23, 2018 at 1:56 PM, Satish Patel <satish.txt@xxxxxxxxx> wrote:
>>> This is great explanation, based on your details look like when reboot
>>> machine (OSD node) it will take longer time to initialize all number
>>> of OSDs but if we use LVM in that case it shorten that time.
>>
>> That is one aspect, yes. Most importantly: all OSDs will consistently
>> come up with ceph-volume. This wasn't the case with ceph-disk and it
>> was impossible to
>> replicate or understand why (hence the 3 hour timeout)
>>
>>>
>>> There is a good chance that LVM impact some performance because of
>>> extra layer, Does anyone has any data which can provide some inside
>>> about good or bad performance. It would be great if your share so it
>>> will help us to understand impact.
>>
>> There isn't performance impact, and if there is, it is negligible.
>>
>>>
>>>
>>>
>>> On Mon, Jul 23, 2018 at 8:37 AM, Alfredo Deza <adeza@xxxxxxxxxx> wrote:
>>>> On Mon, Jul 23, 2018 at 6:09 AM, Nicolas Huillard <nhuillard@xxxxxxxxxxx> wrote:
>>>>> Le dimanche 22 juillet 2018 à 09:51 -0400, Satish Patel a écrit :
>>>>>> I read that post and that's why I open this thread for few more
>>>>>> questions and clearence,
>>>>>>
>>>>>> When you said OSD doesn't come up what actually that means?  After
>>>>>> reboot of node or after service restart or installation of new disk?
>>>>>>
>>>>>> You said we are using manual method what is that?
>>>>>>
>>>>>> I'm building new cluster and had zero prior experience so how can I
>>>>>> produce this error to see lvm is really life saving tool here? I'm
>>>>>> sure there are plenty of people using but I didn't find and good
>>>>>> document except that mailing list which raising more questions in my
>>>>>> mind.
>>>>>
>>>>> When I had to change a few drives manually, copying the old contents
>>>>> over, I noticed that the logical volumes are tagged with lots of
>>>>> information related to how they should be handled at boot time by the
>>>>> OSD startup system.
>>>>> These LVM tags are a good standard way to add that meta-data within the
>>>>> volumes themselves. Apparently, there is no other way to add these tags
>>>>> that allow for bluestore/filestore, SATA/SAS/NVMe, whole drive or
>>>>> partition, etc.
>>>>> They are easy to manage and fail-safe in many configurations.
>>>>
>>>> This is spot on. To clarify even further, let me give a brief overview
>>>> of how that worked with ceph-disk and GPT GUID:
>>>>
>>>> * at creation time, ceph-disk would add a GUID to the partitions so
>>>> that it would later be recognized. These GUID were unique so they
>>>> would ensure accuracy
>>>> * a set of udev rules would be in place to detect when these GUID
>>>> would become available in the system
>>>> * at boot time, udev would start detecting devices coming online, and
>>>> the rules would call out to ceph-disk (the executable)
>>>> * the ceph-disk executable would then call out to the ceph-disk
>>>> systemd unit, with a timeout of three hours the device to which it was
>>>> assigned (e.g. ceph-disk@/dev/sda )
>>>> * the previous step would be done *per device*, waiting for all
>>>> devices associated with the OSD to become available (hence the 3 hour
>>>> timeout)
>>>> * the ceph-disk systemd unit would call back again to the ceph-disk
>>>> command line tool signaling devices are ready (with --sync)
>>>> * the ceph-disk command line tool would call *the ceph-disk command
>>>> line tool again* to "activate" the OSD, having detected (finally) the
>>>> device type (encrypted, partially prepared, etc...)
>>>>
>>>> The above workflow worked for pre-systemd systems, it could've
>>>> probably be streamlined better, but it was what allowed to "discover"
>>>> devices at boot time. The 3 hour timeout was there because
>>>> udev would find these devices being active asynchronously, and
>>>> ceph-disk was trying to coerce a more synchronous behavior to get all
>>>> devices needed. In a dense OSD node, this meant that OSDs
>>>> would not come up at all, inconsistently (sometimes all of them would work!).
>>>>
>>>> Device discovery is a tremendously complicated and difficult problem
>>>> to solve, and we thought that a few simple rules with UDEV would be
>>>> the answer (they weren't). The LVM implementation of ceph-volume
>>>> limits itself to just ask LVM about devices and then gets them
>>>> "activated" at once. On some tests on nodes with ~20 OSDs, we were 10x
>>>> faster to come up (compared to ceph-disk), and fully operational -
>>>> every time.
>>>>
>>>> Since this is a question that keeps coming up, and answers are now
>>>> getting a bit scattered, I'll compound them all into a section in the
>>>> docs. I'll try to address the "layer of complexity", "performance
>>>> overhead", and other
>>>> recurring issues that keep being used.
>>>>
>>>> Any other ideas are welcomed if some of the previously discussed
>>>> things are still not entirely clear.
>>>>
>>>>>
>>>>>> Sent from my iPhone
>>>>>>
>>>>>> > On Jul 22, 2018, at 6:31 AM, Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>
>>>>>> > wrote:
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > I don’t think it will get any more basic than that. Or maybe this?
>>>>>> > If
>>>>>> > the doctor diagnoses you, you can either accept this, get 2nd
>>>>>> > opinion,
>>>>>> > or study medicine to verify it.
>>>>>> >
>>>>>> > In short lvm has been introduced to solve some issues of related
>>>>>> > to
>>>>>> > starting osd's (which I did not have, probably because of a
>>>>>> > 'manual'
>>>>>> > configuration). And it opens the ability to support (more future)
>>>>>> > devices.
>>>>>> >
>>>>>> > I gave you two links, did you read the whole thread?
>>>>>> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg47802.htm
>>>>>> > l
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > -----Original Message-----
>>>>>> > From: Satish Patel [mailto:satish.txt@xxxxxxxxx]
>>>>>> > Sent: zaterdag 21 juli 2018 20:59
>>>>>> > To: ceph-users
>>>>>> > Subject:  Why lvm is recommended method for bleustore
>>>>>> >
>>>>>> > Folks,
>>>>>> >
>>>>>> > I think i am going to boil ocean here, I google a lot about this
>>>>>> > topic
>>>>>> > why lvm is recommended method for bluestore, but didn't find any
>>>>>> > good
>>>>>> > and detail explanation, not even in Ceph official website.
>>>>>> >
>>>>>> > Can someone explain here in basic language because i am no way
>>>>>> > expert so
>>>>>> > just want to understand what is the advantage of adding extra layer
>>>>>> > of
>>>>>> > complexity?
>>>>>> >
>>>>>> > I found this post but its not i got lost reading it and want to see
>>>>>> > what
>>>>>> > other folks suggesting and offering in their language
>>>>>> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg46768.htm
>>>>>> > l
>>>>>> >
>>>>>> > ~S
>>>>>> > _______________________________________________
>>>>>> > ceph-users mailing list
>>>>>> > ceph-users@xxxxxxxxxxxxxx
>>>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>> >
>>>>>> >
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>> --
>>>>> Nicolas Huillard
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux