Re: Braindump: path names, partition labels, FHS, auto-discovery

Sage Weil <sage@xxxxxxxxxxxx> · Tue, 27 Mar 2012 12:40:58 -0700 (PDT)

On Tue, 27 Mar 2012, David McBride wrote:
> On Tue, 2012-03-20 at 00:25 -0700, Sage Weil wrote:
> 
> > Currently the ceph-osd it told which id to be on startup; the only real 
> > shift here would be to let you specify some uuids instead and have it pull 
> > it's rank (id) out of the .../whoami file.
> 
> I'm increasingly coming to believe that an OSD's rank should not be
> exposed to the admin/user.  While it's clearly important as an internal
> implementation detail, I can't currently see a reason why the admin
> needs to know how an OSDs rank, or why it can't (in principle) be
> dynamically managed on the administrator's behalf.
> 
> A non-trivial part of the complexity of (re-)configuring a running
> cluster as nodes are added and removed is the correct numbering of OSDs.
> 
> At the moment I'm still experimenting -- I don't know what's supposed to
> happen when low-numbered OSDs are removed; do all the existing ones
> renumber?  Or do you get fragmentation in the OSD number space?
> 
> If it's the former, then the rank of an OSD is metadata that can change
> during its lifetime -- meaning that it's probably not a good idea to use
> it in path-names, for example.
> 
> I suspect that using UUIDs and/or human-readable labels to refer to OSDs
> is probably going to be superior than using the OSDs' rank.

The ranks don't change, so at least that part is not a problem.  If you 
remove old osds, there's a gap in the id space, but that's not a problem.

I don't think they can be hidden entirely because they are tied to the 
CRUSH map, which can be (and often must be) manipulated directly.

> > We've set up a special key that has permission to create new osds only, 
> > but again it's pretty bad security.  Chef's model just doesn't work well 
> > here.
> 
> If I understand the model correctly:
> 
>  - Each individual daemon generates its own keys.  This is a secret key 
>    in the symmetric-cryptography sense.
>  - Each daemon needs to have its keys registered with the MON cluster.
>    (The MON operates as a centrally-trusted key distribution centre.)
>  - To register keys with the MON cluster, a cluster administrative 
>    privilege is required.
>  - The registration process also updates the effective access-control 
>    privilege set associated with that key.

Alternatively, as TV noted, a 'provisioning' key with the ability only to 
add new keys with specific privs can be used.

> Two thoughts:
> 
>  1. I suspect that some small changes to the bootstrap procedure could 
>     result in a more streamlined process:
> 
>      - OSDs stop being responsible for generating their own keys.  
>        Their keys are instead generated by a MON node and are stored in 
>        the MON cluster.  As a result, the problem to be solved changes: 
> 
>          * before, an OSD needed to have the necessary privileges to
>            update the cluster-wide configuration; 
>          * now, the MON node only needs to have the necessary privileges
>            to install an OSD's secret key on that OSD's host.
> 
>      - It should then be straightforward to set up a privileged, 
>        automated process -- probably on a MON node -- to stage a copy
>        of an OSDs secret key to the appropriate location on that OSD's 
>        host.
> 
>        (This assumes that an OSDs host can be automatically determined 
>        and authenticated using some existing means (SSH keys, Kerberos, 
>        etc.) -- which I'd expect to be the case for any non-trivial 
>        installation.)
> 
>      - This automated process could be triggered by the OSD   
>        installation process -- either by an out-of-band script, or 
>        conceivably by a key-solicitation message sent in-band by the 
>        OSD itself.

I don't have much of an opinion on the strategy in general (TV probably 
does), but this is already possible currently.  If you pass --mkkey along 
with --mkfs to ceph-osd it will generate a key, but if you don't you can 
copy it into place yourself.

>  2. This sounds very similar to the model for Kerberos 5, as used in 
>     the MIT Kerberos implementation and Microsoft's Active Directory.  
>     It might be an interesting (future!) project to see how difficult 
>     it would be to modify the Ceph daemons and protocols to use 
>     Kerberos-based authentication as an alternative to cephx, possibly 
>     via the GSS-API.
> 
> Aha, I note that this is probably not a surprise -- the 0.18 
> release-notes, for example, note the similarity between cephx and
> Kerberos.  I presume the goal in rolling your own was to avoid adding
> barriers to deployment?

That, and we were concerned about scalability issues with kerberos 
itself... it didn't map cleanly only the distributed nature of Ceph.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html