On Mon, Jul 23, 2018 at 2:33 PM, Satish Patel <satish.txt@xxxxxxxxx> wrote: > Alfredo, > > Thanks, I think i should go with LVM then :) > > I have question here, I have 4 physical SSD per server, some reason i > am using ceph-ansible 3.0.8 version which doesn't create LVM volume > itself so i have to create LVM volume manually. > > I am using bluestore ( want to keep WAL/DB on same DATA disk), How do > i create lvm manually on single physical disk? Do i need to create two > logical volume (1 for journal & 1 for Data )? > > I am reading this > http://docs.ceph.com/ceph-ansible/master/osds/scenarios.html (at > bottom) > > lvm_volumes: > - data: data-lv1 > data_vg: vg1 > crush_device_class: foo For a raw device (e.g. /dev/sda) you can do: lvm_volumes: - data: /dev/sda The LV gets created for you in this one case > > > In above example, did they create vg1 (volume group) and created > data-lv1 (logical volume)? If i want to add journal then do i need to > create one more logical volume? I am confused in that document so > need some clarification > > On Mon, Jul 23, 2018 at 2:06 PM, Alfredo Deza <adeza@xxxxxxxxxx> wrote: >> On Mon, Jul 23, 2018 at 1:56 PM, Satish Patel <satish.txt@xxxxxxxxx> wrote: >>> This is great explanation, based on your details look like when reboot >>> machine (OSD node) it will take longer time to initialize all number >>> of OSDs but if we use LVM in that case it shorten that time. >> >> That is one aspect, yes. Most importantly: all OSDs will consistently >> come up with ceph-volume. This wasn't the case with ceph-disk and it >> was impossible to >> replicate or understand why (hence the 3 hour timeout) >> >>> >>> There is a good chance that LVM impact some performance because of >>> extra layer, Does anyone has any data which can provide some inside >>> about good or bad performance. It would be great if your share so it >>> will help us to understand impact. >> >> There isn't performance impact, and if there is, it is negligible. >> >>> >>> >>> >>> On Mon, Jul 23, 2018 at 8:37 AM, Alfredo Deza <adeza@xxxxxxxxxx> wrote: >>>> On Mon, Jul 23, 2018 at 6:09 AM, Nicolas Huillard <nhuillard@xxxxxxxxxxx> wrote: >>>>> Le dimanche 22 juillet 2018 à 09:51 -0400, Satish Patel a écrit : >>>>>> I read that post and that's why I open this thread for few more >>>>>> questions and clearence, >>>>>> >>>>>> When you said OSD doesn't come up what actually that means? After >>>>>> reboot of node or after service restart or installation of new disk? >>>>>> >>>>>> You said we are using manual method what is that? >>>>>> >>>>>> I'm building new cluster and had zero prior experience so how can I >>>>>> produce this error to see lvm is really life saving tool here? I'm >>>>>> sure there are plenty of people using but I didn't find and good >>>>>> document except that mailing list which raising more questions in my >>>>>> mind. >>>>> >>>>> When I had to change a few drives manually, copying the old contents >>>>> over, I noticed that the logical volumes are tagged with lots of >>>>> information related to how they should be handled at boot time by the >>>>> OSD startup system. >>>>> These LVM tags are a good standard way to add that meta-data within the >>>>> volumes themselves. Apparently, there is no other way to add these tags >>>>> that allow for bluestore/filestore, SATA/SAS/NVMe, whole drive or >>>>> partition, etc. >>>>> They are easy to manage and fail-safe in many configurations. >>>> >>>> This is spot on. To clarify even further, let me give a brief overview >>>> of how that worked with ceph-disk and GPT GUID: >>>> >>>> * at creation time, ceph-disk would add a GUID to the partitions so >>>> that it would later be recognized. These GUID were unique so they >>>> would ensure accuracy >>>> * a set of udev rules would be in place to detect when these GUID >>>> would become available in the system >>>> * at boot time, udev would start detecting devices coming online, and >>>> the rules would call out to ceph-disk (the executable) >>>> * the ceph-disk executable would then call out to the ceph-disk >>>> systemd unit, with a timeout of three hours the device to which it was >>>> assigned (e.g. ceph-disk@/dev/sda ) >>>> * the previous step would be done *per device*, waiting for all >>>> devices associated with the OSD to become available (hence the 3 hour >>>> timeout) >>>> * the ceph-disk systemd unit would call back again to the ceph-disk >>>> command line tool signaling devices are ready (with --sync) >>>> * the ceph-disk command line tool would call *the ceph-disk command >>>> line tool again* to "activate" the OSD, having detected (finally) the >>>> device type (encrypted, partially prepared, etc...) >>>> >>>> The above workflow worked for pre-systemd systems, it could've >>>> probably be streamlined better, but it was what allowed to "discover" >>>> devices at boot time. The 3 hour timeout was there because >>>> udev would find these devices being active asynchronously, and >>>> ceph-disk was trying to coerce a more synchronous behavior to get all >>>> devices needed. In a dense OSD node, this meant that OSDs >>>> would not come up at all, inconsistently (sometimes all of them would work!). >>>> >>>> Device discovery is a tremendously complicated and difficult problem >>>> to solve, and we thought that a few simple rules with UDEV would be >>>> the answer (they weren't). The LVM implementation of ceph-volume >>>> limits itself to just ask LVM about devices and then gets them >>>> "activated" at once. On some tests on nodes with ~20 OSDs, we were 10x >>>> faster to come up (compared to ceph-disk), and fully operational - >>>> every time. >>>> >>>> Since this is a question that keeps coming up, and answers are now >>>> getting a bit scattered, I'll compound them all into a section in the >>>> docs. I'll try to address the "layer of complexity", "performance >>>> overhead", and other >>>> recurring issues that keep being used. >>>> >>>> Any other ideas are welcomed if some of the previously discussed >>>> things are still not entirely clear. >>>> >>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>> > On Jul 22, 2018, at 6:31 AM, Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> >>>>>> > wrote: >>>>>> > >>>>>> > >>>>>> > >>>>>> > I don’t think it will get any more basic than that. Or maybe this? >>>>>> > If >>>>>> > the doctor diagnoses you, you can either accept this, get 2nd >>>>>> > opinion, >>>>>> > or study medicine to verify it. >>>>>> > >>>>>> > In short lvm has been introduced to solve some issues of related >>>>>> > to >>>>>> > starting osd's (which I did not have, probably because of a >>>>>> > 'manual' >>>>>> > configuration). And it opens the ability to support (more future) >>>>>> > devices. >>>>>> > >>>>>> > I gave you two links, did you read the whole thread? >>>>>> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg47802.htm >>>>>> > l >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > -----Original Message----- >>>>>> > From: Satish Patel [mailto:satish.txt@xxxxxxxxx] >>>>>> > Sent: zaterdag 21 juli 2018 20:59 >>>>>> > To: ceph-users >>>>>> > Subject: Why lvm is recommended method for bleustore >>>>>> > >>>>>> > Folks, >>>>>> > >>>>>> > I think i am going to boil ocean here, I google a lot about this >>>>>> > topic >>>>>> > why lvm is recommended method for bluestore, but didn't find any >>>>>> > good >>>>>> > and detail explanation, not even in Ceph official website. >>>>>> > >>>>>> > Can someone explain here in basic language because i am no way >>>>>> > expert so >>>>>> > just want to understand what is the advantage of adding extra layer >>>>>> > of >>>>>> > complexity? >>>>>> > >>>>>> > I found this post but its not i got lost reading it and want to see >>>>>> > what >>>>>> > other folks suggesting and offering in their language >>>>>> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg46768.htm >>>>>> > l >>>>>> > >>>>>> > ~S >>>>>> > _______________________________________________ >>>>>> > ceph-users mailing list >>>>>> > ceph-users@xxxxxxxxxxxxxx >>>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> > >>>>>> > >>>>>> >>>>>> _______________________________________________ >>>>>> ceph-users mailing list >>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> -- >>>>> Nicolas Huillard >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com