Re: Call for Interest: Managed SMB Protocol Support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is great, we are currently using the smb protocol heavily to
export kernel-mounted cephfs.
But I encountered a problem. When there are many smb clients
enumerating or listing the same directory, the smb server will
experience high load, and the smb process will become D state.
This problem has been going on for some time and no suitable solution
has been found yet.

John Mulligan <phlogistonjohn@xxxxxxxxxxxxx> 于2024年3月26日周二 03:43写道:
>
> On Monday, March 25, 2024 3:22:26 PM EDT Alexander E. Patrakov wrote:
> > On Mon, Mar 25, 2024 at 11:01 PM John Mulligan
> >
> > <phlogistonjohn@xxxxxxxxxxxxx> wrote:
> > > On Friday, March 22, 2024 2:56:22 PM EDT Alexander E. Patrakov wrote:
> > > > Hi John,
> > > >
> > > > > A few major features we have planned include:
> > > > > * Standalone servers (internally defined users/groups)
> > > >
> > > > No concerns here
> > > >
> > > > > * Active Directory Domain Member Servers
> > > >
> > > > In the second case, what is the plan regarding UID mapping? Is NFS
> > > > coexistence planned, or a concurrent mount of the same directory using
> > > > CephFS directly?
> > >
> > > In the immediate future the plan is to have a very simple, fairly
> > > "opinionated" idmapping scheme based on the autorid backend.
> >
> > OK, the docs for clustered SAMBA do mention the autorid backend in
> > examples. It's a shame that the manual page does not explicitly list
> > it as compatible with clustered setups.
> >
> > However, please consider that the majority of Linux distributions
> > (tested: CentOS, Fedora, Alt Linux, Ubuntu, OpenSUSE) use "realmd" to
> > join AD domains by default (where "default" means a pointy-clicky way
> > in a workstation setup), which uses SSSD, and therefore, by this
> > opinionated choice of the autorid backend, you create mappings that
> > disagree with the supposed majority and the default. This will create
> > problems in the future when you do consider NFS coexistence.
> >
>
> Thanks, I'll keep that in mind.
>
> > Well, it's a different topic that most organizations that I have seen
> > seem to ignore this default. Maybe those that don't have any problems
> > don't have any reason to talk to me? I think that more research is
> > needed here on whether RedHat's and GNOME's push of SSSD is something
> > not-ready or indeed the de-facto standard setup.
> >
>
> I think it's a bit of a mix, but am not sure either.
>
>
> > Even if you don't want to use SSSD, providing an option to provision a
> > few domains with idmap rid backend with statically configured ranges
> > (as an override to autorid) would be a good step forward, as this can
> > be made compatible with the default RedHat setup.
>
> That's reasonable. Thanks for the suggestion.
>
>
> >
> > > Sharing the same directories over both NFS and SMB at the same time, also
> > > known as "multi-protocol", is not planned for now, however we're all aware
> > > that there's often a demand for this feature and we're aware of the
> > > complexity it brings. I expect we'll work on that at some point but not
> > > initially. Similarly, sharing the same directories over a SMB share and
> > > directly on a cephfs mount won't be blocked but we won't recommend it.
> >
> > OK. Feature request: in the case if there are several CephFS
> > filesystems, support configuration of which one to serve.
> >
>
> Putting it on the list.
>
> > > > In fact, I am quite skeptical, because, at least in my experience,
> > > > every customer's SAMBA configuration as a domain member is a unique
> > > > snowflake, and cephadm would need an ability to specify arbitrary UID
> > > > mapping configuration to match what the customer uses elsewhere - and
> > > > the match must be precise.
> > >
> > > I agree - our initial use case is something along the lines:
> > > Users of a Ceph Cluster that have Windows systems, Mac systems, or
> > > appliances that are joined to an existing AD
> > > but are not currently interoperating with the Ceph cluster.
> > >
> > > I expect to add some idpapping configuration and agility down the line,
> > > especially supporting some form of rfc2307 idmapping (where unix IDs are
> > > stored in AD).
> >
> > Yes, for whatever reason, people do this, even though it is cumbersome
> > to manage.
> >
> > > But those who already have idmapping schemes and samba accessing ceph will
> > > probably need to just continue using the existing setups as we don't have
> > > an immediate plan for migrating those users.
> > >
> > > > Here is what I have seen or was told about:
> > > >
> > > > 1. We don't care about interoperability with NFS or CephFS, so we just
> > > > let SAMBA invent whatever UIDs and GIDs it needs using the "tdb2"
> > > > idmap backend. It's completely OK that workstations get different UIDs
> > > > and GIDs, as only SIDs traverse the wire.
> > >
> > > This is pretty close to our initial plan but I'm not clear why you'd think
> > > that "workstations get different UIDs and GIDs". For all systems acessing
> > > the (same) ceph cluster the id mapping should be consistent.
> > > You did make me consider multi-cluster use cases with something like
> > > cephfs
> > > volume mirroring - that's something that I hadn't thought of before *but*
> > > using an algorithmic mapping backend like autorid (and testing) I think
> > > we're mostly OK there.
> >
> > The tdb2 backend (used in my example) is not algorithmic, it is
> > allocating. That is, it sequentially allocates IDs on the
> > first-seen-first-allocated basis. Yet this is what this customer uses,
> > presumably because it is the only backend that explicitly specifies
> > clustering operation in its manual page.
> >
> > And the "autorid" backend is also not fully algorithmic, it allocates
> > ranges to domains on the same sequential basis (see
> > https://github.com/samba-team/samba/blob/6fb98f70c6274e172787c8d5f73aa939201
> > 71e7c/source3/winbindd/idmap_autorid_tdb.c#L82), and therefore can create
> > mismatching mappings if two workstations or servers have seen the users
> > DOMA\usera and DOMB\userb in a different order. It is even mentioned in the
> > manual page. SSSD largely avoids this problem by hashing the domain portion
> > of the SID instead of
> > allocating the subranges on a sequential basis.
> >
>
> Agreed. Thanks for the reminder. This will certainly need to go on the test
> plan.
>
> > > > 2. [not seen in the wild, the customer did not actually implement it,
> > > > it's a product of internal miscommunication, and I am not sure if it
> > > > is valid at all] We don't care about interoperability with CephFS,
> > > > and, while we have NFS, security guys would not allow running NFS
> > > > non-kerberized. Therefore, no UIDs or GIDs traverse the wire, only
> > > > SIDs and names. Therefore, all we need is to allow both SAMBA and NFS
> > > > to use shared UID mapping allocated on as-needed basis using the
> > > > "tdb2" idmap module, and it doesn't matter that these UIDs and GIDs
> > > > are inconsistent with what clients choose.
> > >
> > > Unfortunately, I don't really understand this item. Fortunately, you say
> > > it
> > > was only considered not implemented. :-)
> > >
> > > > 3. We don't care about ACLs at all, and don't care about CephFS
> > > > interoperability. We set ownership of all new files to root:root 0666
> > > > using whatever options are available [well, I would rather use a
> > > > dedicated nobody-style uid/gid here]. All we care about is that only
> > > > authorized workstations or authorized users can connect to each NFS or
> > > > SMB share, and we absolutely don't want them to be able to set custom
> > > > ownership or ACLs.
> > >
> > > Some times known as the "drop-box" use case I think (not to be confused
> > > with the cloud app of a similar name).
> > > We could probably implement something like that as an option but I had not
> > > considered it before.
> > >
> > > > 4. We care about NFS and CephFS file ownership being consistent with
> > > > what Windows clients see. We store all UIDs and GIDs in Active
> > > > Directory using the rfc2307 schema, and it's mandatory that all
> > > > servers (especially SAMBA - thanks to the "ad" idmap backend) respect
> > > > that and don't try to invent anything [well, they do - BUILTIN/Users
> > > > gets its GID through tdb2]. Oh, and by the way, we have this strangely
> > > > low-numbered group that everybody gets wrong unless they set "idmap
> > > > config CORP : range = 500-999999".
> > >
> > > This is oh so similar to a project I worked on prior to working with Ceph.
> > > I think we'll need to do this one eventually but maybe not this year.
> > > One nice side-effect of running in containers is that the low-id number is
> > > less of an issue because the ids only matter within the container context
> > > (and only then if using the kernel file system access methods). We have
> > > much more flexibility with IDs in a container.
> >
> > So - are you going to use the kernel-based mount or the ceph vfs
> > module? My tests indicate that, in situations where there are
> > frequently accessed files, allowing the kernel to cache them in RAM
> > (which the vfs module does not do) can create a big boost in
> > performance. Also, SUSE considers the ceph vfs module a
> > non-recommended solution apparently for the same performance-related
> > reason, see
> > https://documentation.suse.com/ses/7/html/ses-all/cha-ses-cifs.html
>
>
> The prototype module only uses the vfs module due to the extreme simplicity of
> setting it up in containers. Otherwise, we're trying to keep our options open
> and are investigating multiple approaches currently.
>
> > > > 5. We use a few static ranges for algorithmic ID translation using the
> > > > idmap rid backend. Everything works.
> > >
> > > See above.
> > >
> > > > 6. We use SSSD, which provides consistent IDs everywhere, and for a
> > > > few devices which can't use it, we configured compatible idmap rid
> > > > ranges for use with winbindd. The only problem is that we like
> > > > user-private groups, and only SSSD has support for them (although we
> > > > admit it's our fault that we enabled this non-default option).
> > > > 7. We store ID mappings in non-AD LDAP and use winbindd with the
> > > > "ldap" idmap backend.
> > >
> > > For now, we're only planning to do idmapping with winbind and AD. We'd
> > > probably only consider non-AD ldap and/or ssd if there was strong and loud
> > > demand for it.
> >
> > See above.
> >
> > However, as I said, providing a way to use the "rid" backend with
> > statically defined domains and ranges in addition to the default
> > "autorid" backend would be, for me, a good-enough substitute for SSSD.
> >
>
> Sounds reasonable. I've done it that way in a prior role too, so it's somewhat
> familiar.   Thanks!
>
> > > > I am sure other weird but valid setups exist - please extend the list
> > > > if you can.
> > > >
> > > > Which of the above scenarios would be supportable without resorting to
> > > > the old way of installing SAMBA manually alongside the cluster?
> > >
> > > I hope I covered the above with some inline replies. This was great food
> > > for thought and at just the right level of technical detail. So thank you
> > > very much for replying, this is exactly the kind of discussion I want to
> > > have now where the design is still young and flexible.
> > >
> > > One other cool thing I plan on doing is supporting multiple samba
> > > containers running on the same cluster (even the same node if I can
> > > wrangle the network properly). So one could in fact have completely
> > > different domain joins and/or configurations. While I wouldn't suggest
> > > anyone run a whole lot of different configurations on the same cluster -
> > > this idea already allows for some level of agility between schemes. Later
> > > on we might be able to use that as a building block for migration tools,
> > > either from an existing samba setup or between configurations.
> >
> > Multiple SAMBA containers are also good for high availability (with
> > ctdb) or scale-out (with round-robin DNS).
> >
> > > Also, I plan on adding `global_custom_options` and `share_custom_options`
> > > for special overrides for development, qa, and experimentation but those
> > > are strongly within the "you break it, you bought it" realm. But these
> > > could be used for experimenting  with idmapping schemes without having
> > > them all baked into the smb mgr module code.
> >
> > Great, thanks!
>
> Once again, thanks for the feedback. This discussion is very welcome!
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux