On Mon, Jul 23, 2018 at 1:56 PM, Satish Patel <satish.txt@xxxxxxxxx> wrote: > This is great explanation, based on your details look like when reboot > machine (OSD node) it will take longer time to initialize all number > of OSDs but if we use LVM in that case it shorten that time. That is one aspect, yes. Most importantly: all OSDs will consistently come up with ceph-volume. This wasn't the case with ceph-disk and it was impossible to replicate or understand why (hence the 3 hour timeout) > > There is a good chance that LVM impact some performance because of > extra layer, Does anyone has any data which can provide some inside > about good or bad performance. It would be great if your share so it > will help us to understand impact. There isn't performance impact, and if there is, it is negligible. > > > > On Mon, Jul 23, 2018 at 8:37 AM, Alfredo Deza <adeza@xxxxxxxxxx> wrote: >> On Mon, Jul 23, 2018 at 6:09 AM, Nicolas Huillard <nhuillard@xxxxxxxxxxx> wrote: >>> Le dimanche 22 juillet 2018 à 09:51 -0400, Satish Patel a écrit : >>>> I read that post and that's why I open this thread for few more >>>> questions and clearence, >>>> >>>> When you said OSD doesn't come up what actually that means? After >>>> reboot of node or after service restart or installation of new disk? >>>> >>>> You said we are using manual method what is that? >>>> >>>> I'm building new cluster and had zero prior experience so how can I >>>> produce this error to see lvm is really life saving tool here? I'm >>>> sure there are plenty of people using but I didn't find and good >>>> document except that mailing list which raising more questions in my >>>> mind. >>> >>> When I had to change a few drives manually, copying the old contents >>> over, I noticed that the logical volumes are tagged with lots of >>> information related to how they should be handled at boot time by the >>> OSD startup system. >>> These LVM tags are a good standard way to add that meta-data within the >>> volumes themselves. Apparently, there is no other way to add these tags >>> that allow for bluestore/filestore, SATA/SAS/NVMe, whole drive or >>> partition, etc. >>> They are easy to manage and fail-safe in many configurations. >> >> This is spot on. To clarify even further, let me give a brief overview >> of how that worked with ceph-disk and GPT GUID: >> >> * at creation time, ceph-disk would add a GUID to the partitions so >> that it would later be recognized. These GUID were unique so they >> would ensure accuracy >> * a set of udev rules would be in place to detect when these GUID >> would become available in the system >> * at boot time, udev would start detecting devices coming online, and >> the rules would call out to ceph-disk (the executable) >> * the ceph-disk executable would then call out to the ceph-disk >> systemd unit, with a timeout of three hours the device to which it was >> assigned (e.g. ceph-disk@/dev/sda ) >> * the previous step would be done *per device*, waiting for all >> devices associated with the OSD to become available (hence the 3 hour >> timeout) >> * the ceph-disk systemd unit would call back again to the ceph-disk >> command line tool signaling devices are ready (with --sync) >> * the ceph-disk command line tool would call *the ceph-disk command >> line tool again* to "activate" the OSD, having detected (finally) the >> device type (encrypted, partially prepared, etc...) >> >> The above workflow worked for pre-systemd systems, it could've >> probably be streamlined better, but it was what allowed to "discover" >> devices at boot time. The 3 hour timeout was there because >> udev would find these devices being active asynchronously, and >> ceph-disk was trying to coerce a more synchronous behavior to get all >> devices needed. In a dense OSD node, this meant that OSDs >> would not come up at all, inconsistently (sometimes all of them would work!). >> >> Device discovery is a tremendously complicated and difficult problem >> to solve, and we thought that a few simple rules with UDEV would be >> the answer (they weren't). The LVM implementation of ceph-volume >> limits itself to just ask LVM about devices and then gets them >> "activated" at once. On some tests on nodes with ~20 OSDs, we were 10x >> faster to come up (compared to ceph-disk), and fully operational - >> every time. >> >> Since this is a question that keeps coming up, and answers are now >> getting a bit scattered, I'll compound them all into a section in the >> docs. I'll try to address the "layer of complexity", "performance >> overhead", and other >> recurring issues that keep being used. >> >> Any other ideas are welcomed if some of the previously discussed >> things are still not entirely clear. >> >>> >>>> Sent from my iPhone >>>> >>>> > On Jul 22, 2018, at 6:31 AM, Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> >>>> > wrote: >>>> > >>>> > >>>> > >>>> > I don’t think it will get any more basic than that. Or maybe this? >>>> > If >>>> > the doctor diagnoses you, you can either accept this, get 2nd >>>> > opinion, >>>> > or study medicine to verify it. >>>> > >>>> > In short lvm has been introduced to solve some issues of related >>>> > to >>>> > starting osd's (which I did not have, probably because of a >>>> > 'manual' >>>> > configuration). And it opens the ability to support (more future) >>>> > devices. >>>> > >>>> > I gave you two links, did you read the whole thread? >>>> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg47802.htm >>>> > l >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > -----Original Message----- >>>> > From: Satish Patel [mailto:satish.txt@xxxxxxxxx] >>>> > Sent: zaterdag 21 juli 2018 20:59 >>>> > To: ceph-users >>>> > Subject: Why lvm is recommended method for bleustore >>>> > >>>> > Folks, >>>> > >>>> > I think i am going to boil ocean here, I google a lot about this >>>> > topic >>>> > why lvm is recommended method for bluestore, but didn't find any >>>> > good >>>> > and detail explanation, not even in Ceph official website. >>>> > >>>> > Can someone explain here in basic language because i am no way >>>> > expert so >>>> > just want to understand what is the advantage of adding extra layer >>>> > of >>>> > complexity? >>>> > >>>> > I found this post but its not i got lost reading it and want to see >>>> > what >>>> > other folks suggesting and offering in their language >>>> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg46768.htm >>>> > l >>>> > >>>> > ~S >>>> > _______________________________________________ >>>> > ceph-users mailing list >>>> > ceph-users@xxxxxxxxxxxxxx >>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> > >>>> > >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> -- >>> Nicolas Huillard >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com