Re: Why lvm is recommended method for bleustore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I did that but i am using Ceph-ansible 3.0.8 version which doesn't
support auto creation of LVM :(  i think 3.1 version has LVM support.

Because of some reason i have to stick to 3.0.8 so i need to create manually.

On Tue, Jul 24, 2018 at 8:34 AM, Alfredo Deza <adeza@xxxxxxxxxx> wrote:
> On Mon, Jul 23, 2018 at 2:33 PM, Satish Patel <satish.txt@xxxxxxxxx> wrote:
>> Alfredo,
>>
>> Thanks, I think i should go with LVM then :)
>>
>> I have question here, I have 4 physical SSD per server, some reason i
>> am using ceph-ansible 3.0.8 version which doesn't create LVM volume
>> itself so i have to create LVM volume manually.
>>
>> I am using bluestore  ( want to keep WAL/DB on same DATA disk), How do
>> i create lvm manually on single physical disk? Do i need to create two
>> logical volume (1 for journal & 1 for Data )?
>>
>> I am reading this
>> http://docs.ceph.com/ceph-ansible/master/osds/scenarios.html (at
>> bottom)
>>
>> lvm_volumes:
>>   - data: data-lv1
>>     data_vg: vg1
>>     crush_device_class: foo
>
> For a raw device (e.g. /dev/sda) you can do:
>
> lvm_volumes:
>   - data: /dev/sda
>
> The LV gets created for you in this one case
>
>>
>>
>> In above example, did they create vg1 (volume group)  and created
>> data-lv1 (logical volume)? If i want to add journal then do i need to
>> create one more logical volume?  I am confused in that document so
>> need some clarification
>>
>> On Mon, Jul 23, 2018 at 2:06 PM, Alfredo Deza <adeza@xxxxxxxxxx> wrote:
>>> On Mon, Jul 23, 2018 at 1:56 PM, Satish Patel <satish.txt@xxxxxxxxx> wrote:
>>>> This is great explanation, based on your details look like when reboot
>>>> machine (OSD node) it will take longer time to initialize all number
>>>> of OSDs but if we use LVM in that case it shorten that time.
>>>
>>> That is one aspect, yes. Most importantly: all OSDs will consistently
>>> come up with ceph-volume. This wasn't the case with ceph-disk and it
>>> was impossible to
>>> replicate or understand why (hence the 3 hour timeout)
>>>
>>>>
>>>> There is a good chance that LVM impact some performance because of
>>>> extra layer, Does anyone has any data which can provide some inside
>>>> about good or bad performance. It would be great if your share so it
>>>> will help us to understand impact.
>>>
>>> There isn't performance impact, and if there is, it is negligible.
>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jul 23, 2018 at 8:37 AM, Alfredo Deza <adeza@xxxxxxxxxx> wrote:
>>>>> On Mon, Jul 23, 2018 at 6:09 AM, Nicolas Huillard <nhuillard@xxxxxxxxxxx> wrote:
>>>>>> Le dimanche 22 juillet 2018 à 09:51 -0400, Satish Patel a écrit :
>>>>>>> I read that post and that's why I open this thread for few more
>>>>>>> questions and clearence,
>>>>>>>
>>>>>>> When you said OSD doesn't come up what actually that means?  After
>>>>>>> reboot of node or after service restart or installation of new disk?
>>>>>>>
>>>>>>> You said we are using manual method what is that?
>>>>>>>
>>>>>>> I'm building new cluster and had zero prior experience so how can I
>>>>>>> produce this error to see lvm is really life saving tool here? I'm
>>>>>>> sure there are plenty of people using but I didn't find and good
>>>>>>> document except that mailing list which raising more questions in my
>>>>>>> mind.
>>>>>>
>>>>>> When I had to change a few drives manually, copying the old contents
>>>>>> over, I noticed that the logical volumes are tagged with lots of
>>>>>> information related to how they should be handled at boot time by the
>>>>>> OSD startup system.
>>>>>> These LVM tags are a good standard way to add that meta-data within the
>>>>>> volumes themselves. Apparently, there is no other way to add these tags
>>>>>> that allow for bluestore/filestore, SATA/SAS/NVMe, whole drive or
>>>>>> partition, etc.
>>>>>> They are easy to manage and fail-safe in many configurations.
>>>>>
>>>>> This is spot on. To clarify even further, let me give a brief overview
>>>>> of how that worked with ceph-disk and GPT GUID:
>>>>>
>>>>> * at creation time, ceph-disk would add a GUID to the partitions so
>>>>> that it would later be recognized. These GUID were unique so they
>>>>> would ensure accuracy
>>>>> * a set of udev rules would be in place to detect when these GUID
>>>>> would become available in the system
>>>>> * at boot time, udev would start detecting devices coming online, and
>>>>> the rules would call out to ceph-disk (the executable)
>>>>> * the ceph-disk executable would then call out to the ceph-disk
>>>>> systemd unit, with a timeout of three hours the device to which it was
>>>>> assigned (e.g. ceph-disk@/dev/sda )
>>>>> * the previous step would be done *per device*, waiting for all
>>>>> devices associated with the OSD to become available (hence the 3 hour
>>>>> timeout)
>>>>> * the ceph-disk systemd unit would call back again to the ceph-disk
>>>>> command line tool signaling devices are ready (with --sync)
>>>>> * the ceph-disk command line tool would call *the ceph-disk command
>>>>> line tool again* to "activate" the OSD, having detected (finally) the
>>>>> device type (encrypted, partially prepared, etc...)
>>>>>
>>>>> The above workflow worked for pre-systemd systems, it could've
>>>>> probably be streamlined better, but it was what allowed to "discover"
>>>>> devices at boot time. The 3 hour timeout was there because
>>>>> udev would find these devices being active asynchronously, and
>>>>> ceph-disk was trying to coerce a more synchronous behavior to get all
>>>>> devices needed. In a dense OSD node, this meant that OSDs
>>>>> would not come up at all, inconsistently (sometimes all of them would work!).
>>>>>
>>>>> Device discovery is a tremendously complicated and difficult problem
>>>>> to solve, and we thought that a few simple rules with UDEV would be
>>>>> the answer (they weren't). The LVM implementation of ceph-volume
>>>>> limits itself to just ask LVM about devices and then gets them
>>>>> "activated" at once. On some tests on nodes with ~20 OSDs, we were 10x
>>>>> faster to come up (compared to ceph-disk), and fully operational -
>>>>> every time.
>>>>>
>>>>> Since this is a question that keeps coming up, and answers are now
>>>>> getting a bit scattered, I'll compound them all into a section in the
>>>>> docs. I'll try to address the "layer of complexity", "performance
>>>>> overhead", and other
>>>>> recurring issues that keep being used.
>>>>>
>>>>> Any other ideas are welcomed if some of the previously discussed
>>>>> things are still not entirely clear.
>>>>>
>>>>>>
>>>>>>> Sent from my iPhone
>>>>>>>
>>>>>>> > On Jul 22, 2018, at 6:31 AM, Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>
>>>>>>> > wrote:
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > I don’t think it will get any more basic than that. Or maybe this?
>>>>>>> > If
>>>>>>> > the doctor diagnoses you, you can either accept this, get 2nd
>>>>>>> > opinion,
>>>>>>> > or study medicine to verify it.
>>>>>>> >
>>>>>>> > In short lvm has been introduced to solve some issues of related
>>>>>>> > to
>>>>>>> > starting osd's (which I did not have, probably because of a
>>>>>>> > 'manual'
>>>>>>> > configuration). And it opens the ability to support (more future)
>>>>>>> > devices.
>>>>>>> >
>>>>>>> > I gave you two links, did you read the whole thread?
>>>>>>> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg47802.htm
>>>>>>> > l
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > -----Original Message-----
>>>>>>> > From: Satish Patel [mailto:satish.txt@xxxxxxxxx]
>>>>>>> > Sent: zaterdag 21 juli 2018 20:59
>>>>>>> > To: ceph-users
>>>>>>> > Subject:  Why lvm is recommended method for bleustore
>>>>>>> >
>>>>>>> > Folks,
>>>>>>> >
>>>>>>> > I think i am going to boil ocean here, I google a lot about this
>>>>>>> > topic
>>>>>>> > why lvm is recommended method for bluestore, but didn't find any
>>>>>>> > good
>>>>>>> > and detail explanation, not even in Ceph official website.
>>>>>>> >
>>>>>>> > Can someone explain here in basic language because i am no way
>>>>>>> > expert so
>>>>>>> > just want to understand what is the advantage of adding extra layer
>>>>>>> > of
>>>>>>> > complexity?
>>>>>>> >
>>>>>>> > I found this post but its not i got lost reading it and want to see
>>>>>>> > what
>>>>>>> > other folks suggesting and offering in their language
>>>>>>> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg46768.htm
>>>>>>> > l
>>>>>>> >
>>>>>>> > ~S
>>>>>>> > _______________________________________________
>>>>>>> > ceph-users mailing list
>>>>>>> > ceph-users@xxxxxxxxxxxxxx
>>>>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list
>>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>> --
>>>>>> Nicolas Huillard
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux