On Thu, 5 Apr 2012, Bernard Grymonpon wrote: > > On 05 Apr 2012, at 17:17, Sage Weil wrote: > > > On Thu, 5 Apr 2012, Bernard Grymonpon wrote: > >> On 05 Apr 2012, at 14:34, Wido den Hollander wrote: > >> > >>> On 04/05/2012 10:38 AM, Bernard Grymonpon wrote: > >>>> I assume most OSD nodes will normally run a single OSD, so this would not apply to most nodes. > >>>> > >>>> Only in specific cases (where multiple OSDs run on a single node) this would come up, and these specific cases might even require to have the journals split over multiple devices (multiple ssd-disks ...) > >>> > >>> I think that's a wrong assumption. On most systems I think multiple OSDs will exist, it's debatable if one would run OSDs from different clusters very often. > >> > >> If it is recommended setup to have multiple OSDs per node (like, one OSD > >> per physical drive), then we need to take that in account - but don't > >> assume that one node only has one SSD disk for journals, which would be > >> shared between all OSDs... > >> > >>> > >>> I'm currently using: osd data = /var/lib/ceph/$name > >>> > >>> To get back to what sage mentioned, why add the "-data" suffix to a directory name? Isn't it obvious that a directory will contain data? > >> > >> Each osd has data and a journal... there should be some way to identify > >> both... > > > > Yes. The plan is for the chef/juju/whatever bits to that part. For > > example, the scripts triggered by udev/chef/juju would look at the GPT > > labesl to identify OSD disks and mount them in place. It will similarly > > identify journals by matching the osd uuids and start up the daemon with > > the correct journal. > > > > The current plan is that if /var/lib/ceph/osd-data/$id/journal doesn't > > exist (e.g., because we put it on another device), it will look/wait until > > a journal appears. If it is present, ceph-osd can start using that. > > I would suggest you fail the startup of the daemon, as it doesn't have > all the needed parts - I personally don't like these "autodiscover" > thingies, you never know why they are waiting/searching for,... Agreed. The udev rule would not try to start ceph-osd because the journal wasn't present. ceph-osd won't be started unless the journal is present. > > > >>> /var/lib/ceph/$type/$id > > > > I like this. We were originally thinking > > > > /var/lib/ceph/osd-data/ > > /var/lib/ceph/osd-journal/ > > /var/lib/ceph/mon-data/ > > > > but managing bind mounts or symlinks for journals seems error prone. TV's > > now thinking we should just start ceph-osd with > > > > ceph-osd --osd-journal /somewhere/else -i $id > > ... I like this more, and i would even suggest to allow to start the > daemon just like > > ceph-osd --osd-journal /somehwere --osd-data /somewhereelse --conf > /etc/ceph/clustername.conf > > (config file is for the monitors) > > Configuration and determining which one(s) to start is up to our > deployment tools (chef in our case). Yeah. Explicitly specifying osd_data isn't strictly necessary if it matches the default, but the deployment tool could anyway. > Say that we duplicate a node, for some testing/failover/... I would not > want to daemon to automatically start, just because the data is there... I'm not sure if this is something we've looked at yet... TV? sage > > Rgds, > Bernard > Openminds BVBA > > > > > > from upstart/whatever if we have a matching journal elsewhere. > > > > sage > > > > > > > >>> > >>> Wido > >>> > >>>> > >>>> In my case, this doesn't really matter, it is up to the provision software to make the needed symlinks/mounts. > >>>> > >>>> Rgds, > >>>> Bernard > >>>> > >>>> On 05 Apr 2012, at 09:37, Andrey Korolyov wrote: > >>>> > >>>>> In ceph case, such layout breakage may be necessary in almost all > >>>>> installations(except testing), comparing to almost all general-purpose > >>>>> server software which need division like that only in very specific > >>>>> setups. > >>>>> > >>>>> On Thu, Apr 5, 2012 at 11:28 AM, Bernard Grymonpon<bernard@xxxxxxxxxxxx> wrote: > >>>>>> I feel it's up to the sysadmin to mount / symlink the correct storage devices on the correct paths - ceph should not be concerned that some volumes might need to sit together. > >>>>>> > >>>>>> Rgds, > >>>>>> Bernard > >>>>>> > >>>>>> On 05 Apr 2012, at 09:12, Andrey Korolyov wrote: > >>>>>> > >>>>>>> Right, but probably we need journal separation at the directory level > >>>>>>> by default, because there is a very small amount of cases when speed > >>>>>>> of main storage is sufficient for journal or when resulting speed > >>>>>>> decrease is not significant, so journal by default may go into > >>>>>>> /var/lib/ceph/osd/journals/$i/journal where osd/journals mounted on > >>>>>>> the fast disk. > >>>>>>> > >>>>>>> On Thu, Apr 5, 2012 at 10:57 AM, Bernard Grymonpon<bernard@xxxxxxxxxxxx> wrote: > >>>>>>>> > >>>>>>>> On 05 Apr 2012, at 08:32, Sage Weil wrote: > >>>>>>>> > >>>>>>>>> We want to standardize the locations for ceph data directories, configs, > >>>>>>>>> etc. We'd also like to allow a single host to run OSDs that participate > >>>>>>>>> in multiple ceph clusters. We'd like easy to deal with names (i.e., avoid > >>>>>>>>> UUIDs if we can). > >>>>>>>>> > >>>>>>>>> The metavariables are: > >>>>>>>>> cluster = ceph (by default) > >>>>>>>>> type = osd, mon, mds > >>>>>>>>> id = 1, foo, > >>>>>>>>> name = $type.$id = osd.0, mds.a, etc. > >>>>>>>>> > >>>>>>>>> The $cluster variable will come from the command line (--cluster foo) or, > >>>>>>>>> in the case of a udev hotplug tool or something, matching the uuid on the > >>>>>>>>> device with the 'fsid =<uuid>' line in the available config files found > >>>>>>>>> in /etc/ceph. > >>>>>>>>> > >>>>>>>>> The locations could be: > >>>>>>>>> > >>>>>>>>> ceph config file: > >>>>>>>>> /etc/ceph/$cluster.conf (default is thus ceph.conf) > >>>>>>>>> > >>>>>>>>> keyring: > >>>>>>>>> /etc/ceph/$cluster.keyring (fallback to /etc/ceph/keyring) > >>>>>>>>> > >>>>>>>>> osd_data, mon_data: > >>>>>>>>> /var/lib/ceph/$cluster.$name > >>>>>>>>> /var/lib/ceph/$cluster/$name > >>>>>>>>> /var/lib/ceph/data/$cluster.$name > >>>>>>>>> /var/lib/ceph/$type-data/$cluster-$id > >>>>>>>>> > >>>>>>>>> TV and I talked about this today, and one thing we want is for items of a > >>>>>>>>> given type to live together in separate directory so that we don't have to > >>>>>>>>> do any filtering to, say, get all osd data directories. This suggests the > >>>>>>>>> last option (/var/lib/ceph/osd-data/ceph-1, > >>>>>>>>> /var/lib/ceph/mon-data/ceph-foo, etc.), but it's kind of fugly. > >>>>>>>>> > >>>>>>>>> Another option would be to make it > >>>>>>>>> > >>>>>>>>> /var/lib/ceph/$type-data/$id > >>>>>>>>> > >>>>>>>>> (with no $cluster) and make users override the default with something that > >>>>>>>>> includes $cluster (or $fsid, or whatever) in their $cluster.conf if/when > >>>>>>>>> they want multicluster nodes that don't interfere. Then we'd get > >>>>>>>>> /var/lib/ceph/osd-data/1 for non-crazy people, which is pretty easy. > >>>>>>>> > >>>>>>>> As a osd consists of data and the journal, it should stay together, with all info for that one osd in one place: > >>>>>>>> > >>>>>>>> I would suggest > >>>>>>>> > >>>>>>>> /var/lib/ceph/osd/$id/data > >>>>>>>> and > >>>>>>>> /var/lib/ceph/osd/$id/journal > >>>>>>>> > >>>>>>>> ($id could be replaced by $uuid or $name, for which I would prefer $uuid) > >>>>>>>> > >>>>>>>> Rgds, > >>>>>>>> Bernard > >>>>>>>> > >>>>>>>>> > >>>>>>>>> Any other suggestions? Thoughts? > >>>>>>>>> sage > >>>>>>>>> -- > >>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx > >>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx > >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>> -- > >>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx > >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>> > >>>>>> > >>>>> -- > >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx > >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>> > >>>> > >>>> -- > >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx > >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>> > >>> -- > >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>> the body of a message to majordomo@xxxxxxxxxxxxxxx > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>> > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > >> > > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html