On Mon, Jul 23, 2018 at 6:09 AM, Nicolas Huillard <nhuillard@xxxxxxxxxxx> wrote: > Le dimanche 22 juillet 2018 à 09:51 -0400, Satish Patel a écrit : >> I read that post and that's why I open this thread for few more >> questions and clearence, >> >> When you said OSD doesn't come up what actually that means? After >> reboot of node or after service restart or installation of new disk? >> >> You said we are using manual method what is that? >> >> I'm building new cluster and had zero prior experience so how can I >> produce this error to see lvm is really life saving tool here? I'm >> sure there are plenty of people using but I didn't find and good >> document except that mailing list which raising more questions in my >> mind. > > When I had to change a few drives manually, copying the old contents > over, I noticed that the logical volumes are tagged with lots of > information related to how they should be handled at boot time by the > OSD startup system. > These LVM tags are a good standard way to add that meta-data within the > volumes themselves. Apparently, there is no other way to add these tags > that allow for bluestore/filestore, SATA/SAS/NVMe, whole drive or > partition, etc. > They are easy to manage and fail-safe in many configurations. This is spot on. To clarify even further, let me give a brief overview of how that worked with ceph-disk and GPT GUID: * at creation time, ceph-disk would add a GUID to the partitions so that it would later be recognized. These GUID were unique so they would ensure accuracy * a set of udev rules would be in place to detect when these GUID would become available in the system * at boot time, udev would start detecting devices coming online, and the rules would call out to ceph-disk (the executable) * the ceph-disk executable would then call out to the ceph-disk systemd unit, with a timeout of three hours the device to which it was assigned (e.g. ceph-disk@/dev/sda ) * the previous step would be done *per device*, waiting for all devices associated with the OSD to become available (hence the 3 hour timeout) * the ceph-disk systemd unit would call back again to the ceph-disk command line tool signaling devices are ready (with --sync) * the ceph-disk command line tool would call *the ceph-disk command line tool again* to "activate" the OSD, having detected (finally) the device type (encrypted, partially prepared, etc...) The above workflow worked for pre-systemd systems, it could've probably be streamlined better, but it was what allowed to "discover" devices at boot time. The 3 hour timeout was there because udev would find these devices being active asynchronously, and ceph-disk was trying to coerce a more synchronous behavior to get all devices needed. In a dense OSD node, this meant that OSDs would not come up at all, inconsistently (sometimes all of them would work!). Device discovery is a tremendously complicated and difficult problem to solve, and we thought that a few simple rules with UDEV would be the answer (they weren't). The LVM implementation of ceph-volume limits itself to just ask LVM about devices and then gets them "activated" at once. On some tests on nodes with ~20 OSDs, we were 10x faster to come up (compared to ceph-disk), and fully operational - every time. Since this is a question that keeps coming up, and answers are now getting a bit scattered, I'll compound them all into a section in the docs. I'll try to address the "layer of complexity", "performance overhead", and other recurring issues that keep being used. Any other ideas are welcomed if some of the previously discussed things are still not entirely clear. > >> Sent from my iPhone >> >> > On Jul 22, 2018, at 6:31 AM, Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> >> > wrote: >> > >> > >> > >> > I don’t think it will get any more basic than that. Or maybe this? >> > If >> > the doctor diagnoses you, you can either accept this, get 2nd >> > opinion, >> > or study medicine to verify it. >> > >> > In short lvm has been introduced to solve some issues of related >> > to >> > starting osd's (which I did not have, probably because of a >> > 'manual' >> > configuration). And it opens the ability to support (more future) >> > devices. >> > >> > I gave you two links, did you read the whole thread? >> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg47802.htm >> > l >> > >> > >> > >> > >> > >> > -----Original Message----- >> > From: Satish Patel [mailto:satish.txt@xxxxxxxxx] >> > Sent: zaterdag 21 juli 2018 20:59 >> > To: ceph-users >> > Subject: Why lvm is recommended method for bleustore >> > >> > Folks, >> > >> > I think i am going to boil ocean here, I google a lot about this >> > topic >> > why lvm is recommended method for bluestore, but didn't find any >> > good >> > and detail explanation, not even in Ceph official website. >> > >> > Can someone explain here in basic language because i am no way >> > expert so >> > just want to understand what is the advantage of adding extra layer >> > of >> > complexity? >> > >> > I found this post but its not i got lost reading it and want to see >> > what >> > other folks suggesting and offering in their language >> > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg46768.htm >> > l >> > >> > ~S >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> > >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- > Nicolas Huillard > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com