Re: Why lvm is recommended method for bleustore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Alfredo,

Thanks, I think i should go with LVM then :)

I have question here, I have 4 physical SSD per server, some reason i
am using ceph-ansible 3.0.8 version which doesn't create LVM volume
itself so i have to create LVM volume manually.

I am using bluestore  ( want to keep WAL/DB on same DATA disk), How do
i create lvm manually on single physical disk? Do i need to create two
logical volume (1 for journal & 1 for Data )?

I am reading this
http://docs.ceph.com/ceph-ansible/master/osds/scenarios.html (at
bottom)

lvm_volumes:
  - data: data-lv1
    data_vg: vg1
    crush_device_class: foo


In above example, did they create vg1 (volume group)  and created
data-lv1 (logical volume)? If i want to add journal then do i need to
create one more logical volume?  I am confused in that document so
need some clarification

On Mon, Jul 23, 2018 at 2:06 PM, Alfredo Deza <adeza@xxxxxxxxxx> wrote:
> On Mon, Jul 23, 2018 at 1:56 PM, Satish Patel <satish.txt@xxxxxxxxx> wrote:
>> This is great explanation, based on your details look like when reboot
>> machine (OSD node) it will take longer time to initialize all number
>> of OSDs but if we use LVM in that case it shorten that time.
>
> That is one aspect, yes. Most importantly: all OSDs will consistently
> come up with ceph-volume. This wasn't the case with ceph-disk and it
> was impossible to
> replicate or understand why (hence the 3 hour timeout)
>
>>
>> There is a good chance that LVM impact some performance because of
>> extra layer, Does anyone has any data which can provide some inside
>> about good or bad performance. It would be great if your share so it
>> will help us to understand impact.
>
> There isn't performance impact, and if there is, it is negligible.
>
>>
>>
>>
>> On Mon, Jul 23, 2018 at 8:37 AM, Alfredo Deza <adeza@xxxxxxxxxx> wrote:
>>> On Mon, Jul 23, 2018 at 6:09 AM, Nicolas Huillard <nhuillard@xxxxxxxxxxx> wrote:
>>>> Le dimanche 22 juillet 2018 à 09:51 -0400, Satish Patel a écrit :
>>>>> I read that post and that's why I open this thread for few more
>>>>> questions and clearence,
>>>>>
>>>>> When you said OSD doesn't come up what actually that means?  After
>>>>> reboot of node or after service restart or installation of new disk?
>>>>>
>>>>> You said we are using manual method what is that?
>>>>>
>>>>> I'm building new cluster and had zero prior experience so how can I
>>>>> produce this error to see lvm is really life saving tool here? I'm
>>>>> sure there are plenty of people using but I didn't find and good
>>>>> document except that mailing list which raising more questions in my
>>>>> mind.
>>>>
>>>> When I had to change a few drives manually, copying the old contents
>>>> over, I noticed that the logical volumes are tagged with lots of
>>>> information related to how they should be handled at boot time by the
>>>> OSD startup system.
>>>> These LVM tags are a good standard way to add that meta-data within the
>>>> volumes themselves. Apparently, there is no other way to add these tags
>>>> that allow for bluestore/filestore, SATA/SAS/NVMe, whole drive or
>>>> partition, etc.
>>>> They are easy to manage and fail-safe in many configurations.
>>>
>>> This is spot on. To clarify even further, let me give a brief overview
>>> of how that worked with ceph-disk and GPT GUID:
>>>
>>> * at creation time, ceph-disk would add a GUID to the partitions so
>>> that it would later be recognized. These GUID were unique so they
>>> would ensure accuracy
>>> * a set of udev rules would be in place to detect when these GUID
>>> would become available in the system
>>> * at boot time, udev would start detecting devices coming online, and
>>> the rules would call out to ceph-disk (the executable)
>>> * the ceph-disk executable would then call out to the ceph-disk
>>> systemd unit, with a timeout of three hours the device to which it was
>>> assigned (e.g. ceph-disk@/dev/sda )
>>> * the previous step would be done *per device*, waiting for all
>>> devices associated with the OSD to become available (hence the 3 hour
>>> timeout)
>>> * the ceph-disk systemd unit would call back again to the ceph-disk
>>> command line tool signaling devices are ready (with --sync)
>>> * the ceph-disk command line tool would call *the ceph-disk command
>>> line tool again* to "activate" the OSD, having detected (finally) the
>>> device type (encrypted, partially prepared, etc...)
>>>
>>> The above workflow worked for pre-systemd systems, it could've
>>> probably be streamlined better, but it was what allowed to "discover"
>>> devices at boot time. The 3 hour timeout was there because
>>> udev would find these devices being active asynchronously, and
>>> ceph-disk was trying to coerce a more synchronous behavior to get all
>>> devices needed. In a dense OSD node, this meant that OSDs
>>> would not come up at all, inconsistently (sometimes all of them would work!).
>>>
>>> Device discovery is a tremendously complicated and difficult problem
>>> to solve, and we thought that a few simple rules with UDEV would be
>>> the answer (they weren't). The LVM implementation of ceph-volume
>>> limits itself to just ask LVM about devices and then gets them
>>> "activated" at once. On some tests on nodes with ~20 OSDs, we were 10x
>>> faster to come up (compared to ceph-disk), and fully operational -
>>> every time.
>>>
>>> Since this is a question that keeps coming up, and answers are now
>>> getting a bit scattered, I'll compound them all into a section in the
>>> docs. I'll try to address the "layer of complexity", "performance
>>> overhead", and other
>>> recurring issues that keep being used.
>>>
>>> Any other ideas are welcomed if some of the previously discussed
>>> things are still not entirely clear.
>>>
>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> > On Jul 22, 2018, at 6:31 AM, Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>
>>>>> > wrote:
>>>>> >
>>>>> >
>>>>> >
>>>>> > I don’t think it will get any more basic than that. Or maybe this?
>>>>> > If
>>>>> > the doctor diagnoses you, you can either accept this, get 2nd
>>>>> > opinion,
>>>>> > or study medicine to verify it.
>>>>> >
>>>>> > In short lvm has been introduced to solve some issues of related
>>>>> > to
>>>>> > starting osd's (which I did not have, probably because of a
>>>>> > 'manual'
>>>>> > configuration). And it opens the ability to support (more future)
>>>>> > devices.
>>>>> >
>>>>> > I gave you two links, did you read the whole thread?
>>>>> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg47802.htm
>>>>> > l
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > -----Original Message-----
>>>>> > From: Satish Patel [mailto:satish.txt@xxxxxxxxx]
>>>>> > Sent: zaterdag 21 juli 2018 20:59
>>>>> > To: ceph-users
>>>>> > Subject:  Why lvm is recommended method for bleustore
>>>>> >
>>>>> > Folks,
>>>>> >
>>>>> > I think i am going to boil ocean here, I google a lot about this
>>>>> > topic
>>>>> > why lvm is recommended method for bluestore, but didn't find any
>>>>> > good
>>>>> > and detail explanation, not even in Ceph official website.
>>>>> >
>>>>> > Can someone explain here in basic language because i am no way
>>>>> > expert so
>>>>> > just want to understand what is the advantage of adding extra layer
>>>>> > of
>>>>> > complexity?
>>>>> >
>>>>> > I found this post but its not i got lost reading it and want to see
>>>>> > what
>>>>> > other folks suggesting and offering in their language
>>>>> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg46768.htm
>>>>> > l
>>>>> >
>>>>> > ~S
>>>>> > _______________________________________________
>>>>> > ceph-users mailing list
>>>>> > ceph-users@xxxxxxxxxxxxxx
>>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>> >
>>>>> >
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> --
>>>> Nicolas Huillard
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux