Re: defaults paths

Bernard Grymonpon <bernard@xxxxxxxxxxxx> · Thu, 5 Apr 2012 18:27:46 +0200

On 05 Apr 2012, at 17:17, Sage Weil wrote:

> On Thu, 5 Apr 2012, Bernard Grymonpon wrote:
>> On 05 Apr 2012, at 14:34, Wido den Hollander wrote:
>> 
>>> On 04/05/2012 10:38 AM, Bernard Grymonpon wrote:
>>>> I assume most OSD nodes will normally run a single OSD, so this would not apply to most nodes.
>>>> 
>>>> Only in specific cases (where multiple OSDs run on a single node) this would come up, and these specific cases might even require to have the journals split over multiple devices (multiple ssd-disks ...)
>>> 
>>> I think that's a wrong assumption. On most systems I think multiple OSDs will exist, it's debatable if one would run OSDs from different clusters very often.
>> 
>> If it is recommended setup to have multiple OSDs per node (like, one OSD 
>> per physical drive), then we need to take that in account - but don't 
>> assume that one node only has one SSD disk for journals, which would be 
>> shared between all OSDs...
>> 
>>> 
>>> I'm currently using: osd data = /var/lib/ceph/$name
>>> 
>>> To get back to what sage mentioned, why add the "-data" suffix to a directory name? Isn't it obvious that a directory will contain data?
>> 
>> Each osd has data and a journal... there should be some way to identify 
>> both...
> 
> Yes.  The plan is for the chef/juju/whatever bits to that part.  For 
> example, the scripts triggered by udev/chef/juju would look at the GPT 
> labesl to identify OSD disks and mount them in place.  It will similarly 
> identify journals by matching the osd uuids and start up the daemon with 
> the correct journal.
> 
> The current plan is that if /var/lib/ceph/osd-data/$id/journal doesn't 
> exist (e.g., because we put it on another device), it will look/wait until 
> a journal appears.  If it is present, ceph-osd can start using that.

I would suggest you fail the startup of the daemon, as it doesn't have all the needed parts - I personally don't like these "autodiscover" thingies, you never know why they are waiting/searching for,... 

> 
>>> /var/lib/ceph/$type/$id
> 
> I like this.  We were originally thinking
> 
> /var/lib/ceph/osd-data/
> /var/lib/ceph/osd-journal/
> /var/lib/ceph/mon-data/
> 
> but managing bind mounts or symlinks for journals seems error prone.  TV's 
> now thinking we should just start ceph-osd with
> 
>  ceph-osd --osd-journal /somewhere/else -i $id

... I like this more, and i would even suggest to allow to start the daemon just like

ceph-osd --osd-journal /somehwere --osd-data /somewhereelse --conf /etc/ceph/clustername.conf 

(config file is for the monitors)

Configuration and determining which one(s) to start is up to our deployment tools (chef in our case).

Say that we duplicate a node, for some testing/failover/... I would not want to daemon to automatically start, just because the data is there...

Rgds,
Bernard
Openminds BVBA

> 
> from upstart/whatever if we have a matching journal elsewhere.
> 
> sage
> 
> 
> 
>>> 
>>> Wido
>>> 
>>>> 
>>>> In my case, this doesn't really matter, it is up to the provision software to make the needed symlinks/mounts.
>>>> 
>>>> Rgds,
>>>> Bernard
>>>> 
>>>> On 05 Apr 2012, at 09:37, Andrey Korolyov wrote:
>>>> 
>>>>> In ceph case, such layout breakage may be necessary in almost all
>>>>> installations(except testing), comparing to almost all general-purpose
>>>>> server software which need division like that only in very specific
>>>>> setups.
>>>>> 
>>>>> On Thu, Apr 5, 2012 at 11:28 AM, Bernard Grymonpon<bernard@xxxxxxxxxxxx>  wrote:
>>>>>> I feel it's up to the sysadmin to mount / symlink the correct storage devices on the correct paths - ceph should not be concerned that some volumes might need to sit together.
>>>>>> 
>>>>>> Rgds,
>>>>>> Bernard
>>>>>> 
>>>>>> On 05 Apr 2012, at 09:12, Andrey Korolyov wrote:
>>>>>> 
>>>>>>> Right, but probably we need journal separation at the directory level
>>>>>>> by default, because there is a very small amount of cases when speed
>>>>>>> of main storage is sufficient for journal or when resulting speed
>>>>>>> decrease is not significant, so journal by default may go into
>>>>>>> /var/lib/ceph/osd/journals/$i/journal where osd/journals mounted on
>>>>>>> the fast disk.
>>>>>>> 
>>>>>>> On Thu, Apr 5, 2012 at 10:57 AM, Bernard Grymonpon<bernard@xxxxxxxxxxxx>  wrote:
>>>>>>>> 
>>>>>>>> On 05 Apr 2012, at 08:32, Sage Weil wrote:
>>>>>>>> 
>>>>>>>>> We want to standardize the locations for ceph data directories, configs,
>>>>>>>>> etc.  We'd also like to allow a single host to run OSDs that participate
>>>>>>>>> in multiple ceph clusters.  We'd like easy to deal with names (i.e., avoid
>>>>>>>>> UUIDs if we can).
>>>>>>>>> 
>>>>>>>>> The metavariables are:
>>>>>>>>> cluster = ceph (by default)
>>>>>>>>> type = osd, mon, mds
>>>>>>>>> id = 1, foo,
>>>>>>>>> name = $type.$id = osd.0, mds.a, etc.
>>>>>>>>> 
>>>>>>>>> The $cluster variable will come from the command line (--cluster foo) or,
>>>>>>>>> in the case of a udev hotplug tool or something, matching the uuid on the
>>>>>>>>> device with the 'fsid =<uuid>' line in the available config files found
>>>>>>>>> in /etc/ceph.
>>>>>>>>> 
>>>>>>>>> The locations could be:
>>>>>>>>> 
>>>>>>>>> ceph config file:
>>>>>>>>> /etc/ceph/$cluster.conf     (default is thus ceph.conf)
>>>>>>>>> 
>>>>>>>>> keyring:
>>>>>>>>> /etc/ceph/$cluster.keyring  (fallback to /etc/ceph/keyring)
>>>>>>>>> 
>>>>>>>>> osd_data, mon_data:
>>>>>>>>> /var/lib/ceph/$cluster.$name
>>>>>>>>> /var/lib/ceph/$cluster/$name
>>>>>>>>> /var/lib/ceph/data/$cluster.$name
>>>>>>>>> /var/lib/ceph/$type-data/$cluster-$id
>>>>>>>>> 
>>>>>>>>> TV and I talked about this today, and one thing we want is for items of a
>>>>>>>>> given type to live together in separate directory so that we don't have to
>>>>>>>>> do any filtering to, say, get all osd data directories.  This suggests the
>>>>>>>>> last option (/var/lib/ceph/osd-data/ceph-1,
>>>>>>>>> /var/lib/ceph/mon-data/ceph-foo, etc.), but it's kind of fugly.
>>>>>>>>> 
>>>>>>>>> Another option would be to make it
>>>>>>>>> 
>>>>>>>>> /var/lib/ceph/$type-data/$id
>>>>>>>>> 
>>>>>>>>> (with no $cluster) and make users override the default with something that
>>>>>>>>> includes $cluster (or $fsid, or whatever) in their $cluster.conf if/when
>>>>>>>>> they want multicluster nodes that don't interfere.  Then we'd get
>>>>>>>>> /var/lib/ceph/osd-data/1 for non-crazy people, which is pretty easy.
>>>>>>>> 
>>>>>>>> As a osd consists of data and the journal, it should stay together, with all info for that one osd in one place:
>>>>>>>> 
>>>>>>>> I would suggest
>>>>>>>> 
>>>>>>>> /var/lib/ceph/osd/$id/data
>>>>>>>> and
>>>>>>>> /var/lib/ceph/osd/$id/journal
>>>>>>>> 
>>>>>>>> ($id could be replaced by $uuid or $name, for which I would prefer $uuid)
>>>>>>>> 
>>>>>>>> Rgds,
>>>>>>>> Bernard
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Any other suggestions?  Thoughts?
>>>>>>>>> sage
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>> 
>>>>>> 
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> 
>>>> 
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> 
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html