Re: [ceph-users] Re: Call for Interest: Managed SMB Protocol Support

John Mulligan <phlogistonjohn@xxxxxxxxxxxxx> · Mon, 25 Mar 2024 11:01:42 -0400

On Friday, March 22, 2024 2:56:22 PM EDT Alexander E. Patrakov wrote:
> Hi John,
> 
> > A few major features we have planned include:
> > * Standalone servers (internally defined users/groups)
> 
> No concerns here
> 
> > * Active Directory Domain Member Servers
> 
> In the second case, what is the plan regarding UID mapping? Is NFS
> coexistence planned, or a concurrent mount of the same directory using
> CephFS directly?

In the immediate future the plan is to have a very simple, fairly 
"opinionated" idmapping scheme based on the autorid backend.
Sharing the same directories over both NFS and SMB at the same time, also 
known as "multi-protocol", is not planned for now, however we're all aware 
that there's often a demand for this feature and we're aware of the complexity 
it brings. I expect we'll work on that at some point but not initially. 
Similarly, sharing the same directories over a SMB share and directly on a 
cephfs mount won't be blocked but we won't recommend it.

> 
> In fact, I am quite skeptical, because, at least in my experience,
> every customer's SAMBA configuration as a domain member is a unique
> snowflake, and cephadm would need an ability to specify arbitrary UID
> mapping configuration to match what the customer uses elsewhere - and
> the match must be precise.
> 

I agree - our initial use case is something along the lines:
Users of a Ceph Cluster that have Windows systems, Mac systems, or appliances 
that are joined to an existing AD
but are not currently interoperating with the Ceph cluster.

I expect to add some idpapping configuration and agility down the line, 
especially supporting some form of rfc2307 idmapping (where unix IDs are 
stored in AD).

But those who already have idmapping schemes and samba accessing ceph will 
probably need to just continue using the existing setups as we don't have an 
immediate plan for migrating those users.

> Here is what I have seen or was told about:
> 
> 1. We don't care about interoperability with NFS or CephFS, so we just
> let SAMBA invent whatever UIDs and GIDs it needs using the "tdb2"
> idmap backend. It's completely OK that workstations get different UIDs
> and GIDs, as only SIDs traverse the wire.

This is pretty close to our initial plan but I'm not clear why you'd think 
that "workstations get different UIDs and GIDs". For all systems acessing the 
(same) ceph cluster the id mapping should be consistent.
You did make me consider multi-cluster use cases with something like cephfs 
volume mirroring - that's something that I hadn't thought of before *but* 
using an algorithmic mapping backend like autorid (and testing) I think we're 
mostly OK there.

> 2. [not seen in the wild, the customer did not actually implement it,
> it's a product of internal miscommunication, and I am not sure if it
> is valid at all] We don't care about interoperability with CephFS,
> and, while we have NFS, security guys would not allow running NFS
> non-kerberized. Therefore, no UIDs or GIDs traverse the wire, only
> SIDs and names. Therefore, all we need is to allow both SAMBA and NFS
> to use shared UID mapping allocated on as-needed basis using the
> "tdb2" idmap module, and it doesn't matter that these UIDs and GIDs
> are inconsistent with what clients choose.

Unfortunately, I don't really understand this item. Fortunately, you say it 
was only considered not implemented. :-)

> 3. We don't care about ACLs at all, and don't care about CephFS
> interoperability. We set ownership of all new files to root:root 0666
> using whatever options are available [well, I would rather use a
> dedicated nobody-style uid/gid here]. All we care about is that only
> authorized workstations or authorized users can connect to each NFS or
> SMB share, and we absolutely don't want them to be able to set custom
> ownership or ACLs.

Some times known as the "drop-box" use case I think (not to be confused with 
the cloud app of a similar name).
We could probably implement something like that as an option but I had not 
considered it before.

> 4. We care about NFS and CephFS file ownership being consistent with
> what Windows clients see. We store all UIDs and GIDs in Active
> Directory using the rfc2307 schema, and it's mandatory that all
> servers (especially SAMBA - thanks to the "ad" idmap backend) respect
> that and don't try to invent anything [well, they do - BUILTIN/Users
> gets its GID through tdb2]. Oh, and by the way, we have this strangely
> low-numbered group that everybody gets wrong unless they set "idmap
> config CORP : range = 500-999999".

This is oh so similar to a project I worked on prior to working with Ceph.
I think we'll need to do this one eventually but maybe not this year.
One nice side-effect of running in containers is that the low-id number is less 
of an issue because the ids only matter within the container context (and only 
then if using the kernel file system access methods). We have much more 
flexibility with IDs in a container.

> 5. We use a few static ranges for algorithmic ID translation using the
> idmap rid backend. Everything works.

See above.

> 6. We use SSSD, which provides consistent IDs everywhere, and for a
> few devices which can't use it, we configured compatible idmap rid
> ranges for use with winbindd. The only problem is that we like
> user-private groups, and only SSSD has support for them (although we
> admit it's our fault that we enabled this non-default option).
> 7. We store ID mappings in non-AD LDAP and use winbindd with the
> "ldap" idmap backend.
> 

For now, we're only planning to do idmapping with winbind and AD. We'd 
probably only consider non-AD ldap and/or ssd if there was strong and loud 
demand for it.

> I am sure other weird but valid setups exist - please extend the list
> if you can.
> 
> Which of the above scenarios would be supportable without resorting to
> the old way of installing SAMBA manually alongside the cluster?

I hope I covered the above with some inline replies. This was great food for 
thought and at just the right level of technical detail. So thank you very 
much for replying, this is exactly the kind of discussion I want to have now 
where the design is still young and flexible.

One other cool thing I plan on doing is supporting multiple samba containers 
running on the same cluster (even the same node if I can wrangle the network 
properly). So one could in fact have completely different domain joins and/or 
configurations. While I wouldn't suggest anyone run a whole lot of different 
configurations on the same cluster - this idea already allows for some level of 
agility between schemes. Later on we might be able to use that as a building 
block for migration tools, either from an existing samba setup or between 
configurations.

Also, I plan on adding `global_custom_options` and `share_custom_options` for 
special overrides for development, qa, and experimentation but those are 
strongly within the "you break it, you bought it" realm. But these could be 
used for experimenting  with idmapping schemes without having them all baked 
into the smb mgr module code.

Thanks for the discussion! 
--John M.

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx