Alfredo, Thanks, I think i should go with LVM then :) I have question here, I have 4 physical SSD per server, some reason i am using ceph-ansible 3.0.8 version which doesn't create LVM volume itself so i have to create LVM volume manually. I am using bluestore ( want to keep WAL/DB on same DATA disk), How do i create lvm manually on single physical disk? Do i need to create two logical volume (1 for journal & 1 for Data )? I am reading this http://docs.ceph.com/ceph-ansible/master/osds/scenarios.html (at bottom) lvm_volumes: - data: data-lv1 data_vg: vg1 crush_device_class: foo In above example, did they create vg1 (volume group) and created data-lv1 (logical volume)? If i want to add journal then do i need to create one more logical volume? I am confused in that document so need some clarification On Mon, Jul 23, 2018 at 2:06 PM, Alfredo Deza <adeza@xxxxxxxxxx> wrote: > On Mon, Jul 23, 2018 at 1:56 PM, Satish Patel <satish.txt@xxxxxxxxx> wrote: >> This is great explanation, based on your details look like when reboot >> machine (OSD node) it will take longer time to initialize all number >> of OSDs but if we use LVM in that case it shorten that time. > > That is one aspect, yes. Most importantly: all OSDs will consistently > come up with ceph-volume. This wasn't the case with ceph-disk and it > was impossible to > replicate or understand why (hence the 3 hour timeout) > >> >> There is a good chance that LVM impact some performance because of >> extra layer, Does anyone has any data which can provide some inside >> about good or bad performance. It would be great if your share so it >> will help us to understand impact. > > There isn't performance impact, and if there is, it is negligible. > >> >> >> >> On Mon, Jul 23, 2018 at 8:37 AM, Alfredo Deza <adeza@xxxxxxxxxx> wrote: >>> On Mon, Jul 23, 2018 at 6:09 AM, Nicolas Huillard <nhuillard@xxxxxxxxxxx> wrote: >>>> Le dimanche 22 juillet 2018 à 09:51 -0400, Satish Patel a écrit : >>>>> I read that post and that's why I open this thread for few more >>>>> questions and clearence, >>>>> >>>>> When you said OSD doesn't come up what actually that means? After >>>>> reboot of node or after service restart or installation of new disk? >>>>> >>>>> You said we are using manual method what is that? >>>>> >>>>> I'm building new cluster and had zero prior experience so how can I >>>>> produce this error to see lvm is really life saving tool here? I'm >>>>> sure there are plenty of people using but I didn't find and good >>>>> document except that mailing list which raising more questions in my >>>>> mind. >>>> >>>> When I had to change a few drives manually, copying the old contents >>>> over, I noticed that the logical volumes are tagged with lots of >>>> information related to how they should be handled at boot time by the >>>> OSD startup system. >>>> These LVM tags are a good standard way to add that meta-data within the >>>> volumes themselves. Apparently, there is no other way to add these tags >>>> that allow for bluestore/filestore, SATA/SAS/NVMe, whole drive or >>>> partition, etc. >>>> They are easy to manage and fail-safe in many configurations. >>> >>> This is spot on. To clarify even further, let me give a brief overview >>> of how that worked with ceph-disk and GPT GUID: >>> >>> * at creation time, ceph-disk would add a GUID to the partitions so >>> that it would later be recognized. These GUID were unique so they >>> would ensure accuracy >>> * a set of udev rules would be in place to detect when these GUID >>> would become available in the system >>> * at boot time, udev would start detecting devices coming online, and >>> the rules would call out to ceph-disk (the executable) >>> * the ceph-disk executable would then call out to the ceph-disk >>> systemd unit, with a timeout of three hours the device to which it was >>> assigned (e.g. ceph-disk@/dev/sda ) >>> * the previous step would be done *per device*, waiting for all >>> devices associated with the OSD to become available (hence the 3 hour >>> timeout) >>> * the ceph-disk systemd unit would call back again to the ceph-disk >>> command line tool signaling devices are ready (with --sync) >>> * the ceph-disk command line tool would call *the ceph-disk command >>> line tool again* to "activate" the OSD, having detected (finally) the >>> device type (encrypted, partially prepared, etc...) >>> >>> The above workflow worked for pre-systemd systems, it could've >>> probably be streamlined better, but it was what allowed to "discover" >>> devices at boot time. The 3 hour timeout was there because >>> udev would find these devices being active asynchronously, and >>> ceph-disk was trying to coerce a more synchronous behavior to get all >>> devices needed. In a dense OSD node, this meant that OSDs >>> would not come up at all, inconsistently (sometimes all of them would work!). >>> >>> Device discovery is a tremendously complicated and difficult problem >>> to solve, and we thought that a few simple rules with UDEV would be >>> the answer (they weren't). The LVM implementation of ceph-volume >>> limits itself to just ask LVM about devices and then gets them >>> "activated" at once. On some tests on nodes with ~20 OSDs, we were 10x >>> faster to come up (compared to ceph-disk), and fully operational - >>> every time. >>> >>> Since this is a question that keeps coming up, and answers are now >>> getting a bit scattered, I'll compound them all into a section in the >>> docs. I'll try to address the "layer of complexity", "performance >>> overhead", and other >>> recurring issues that keep being used. >>> >>> Any other ideas are welcomed if some of the previously discussed >>> things are still not entirely clear. >>> >>>> >>>>> Sent from my iPhone >>>>> >>>>> > On Jul 22, 2018, at 6:31 AM, Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> >>>>> > wrote: >>>>> > >>>>> > >>>>> > >>>>> > I don’t think it will get any more basic than that. Or maybe this? >>>>> > If >>>>> > the doctor diagnoses you, you can either accept this, get 2nd >>>>> > opinion, >>>>> > or study medicine to verify it. >>>>> > >>>>> > In short lvm has been introduced to solve some issues of related >>>>> > to >>>>> > starting osd's (which I did not have, probably because of a >>>>> > 'manual' >>>>> > configuration). And it opens the ability to support (more future) >>>>> > devices. >>>>> > >>>>> > I gave you two links, did you read the whole thread? >>>>> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg47802.htm >>>>> > l >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > -----Original Message----- >>>>> > From: Satish Patel [mailto:satish.txt@xxxxxxxxx] >>>>> > Sent: zaterdag 21 juli 2018 20:59 >>>>> > To: ceph-users >>>>> > Subject: Why lvm is recommended method for bleustore >>>>> > >>>>> > Folks, >>>>> > >>>>> > I think i am going to boil ocean here, I google a lot about this >>>>> > topic >>>>> > why lvm is recommended method for bluestore, but didn't find any >>>>> > good >>>>> > and detail explanation, not even in Ceph official website. >>>>> > >>>>> > Can someone explain here in basic language because i am no way >>>>> > expert so >>>>> > just want to understand what is the advantage of adding extra layer >>>>> > of >>>>> > complexity? >>>>> > >>>>> > I found this post but its not i got lost reading it and want to see >>>>> > what >>>>> > other folks suggesting and offering in their language >>>>> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg46768.htm >>>>> > l >>>>> > >>>>> > ~S >>>>> > _______________________________________________ >>>>> > ceph-users mailing list >>>>> > ceph-users@xxxxxxxxxxxxxx >>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> > >>>>> > >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> -- >>>> Nicolas Huillard >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com