This is great, we are currently using the smb protocol heavily to export kernel-mounted cephfs. But I encountered a problem. When there are many smb clients enumerating or listing the same directory, the smb server will experience high load, and the smb process will become D state. This problem has been going on for some time and no suitable solution has been found yet. John Mulligan <phlogistonjohn@xxxxxxxxxxxxx> 于2024年3月26日周二 03:43写道: > > On Monday, March 25, 2024 3:22:26 PM EDT Alexander E. Patrakov wrote: > > On Mon, Mar 25, 2024 at 11:01 PM John Mulligan > > > > <phlogistonjohn@xxxxxxxxxxxxx> wrote: > > > On Friday, March 22, 2024 2:56:22 PM EDT Alexander E. Patrakov wrote: > > > > Hi John, > > > > > > > > > A few major features we have planned include: > > > > > * Standalone servers (internally defined users/groups) > > > > > > > > No concerns here > > > > > > > > > * Active Directory Domain Member Servers > > > > > > > > In the second case, what is the plan regarding UID mapping? Is NFS > > > > coexistence planned, or a concurrent mount of the same directory using > > > > CephFS directly? > > > > > > In the immediate future the plan is to have a very simple, fairly > > > "opinionated" idmapping scheme based on the autorid backend. > > > > OK, the docs for clustered SAMBA do mention the autorid backend in > > examples. It's a shame that the manual page does not explicitly list > > it as compatible with clustered setups. > > > > However, please consider that the majority of Linux distributions > > (tested: CentOS, Fedora, Alt Linux, Ubuntu, OpenSUSE) use "realmd" to > > join AD domains by default (where "default" means a pointy-clicky way > > in a workstation setup), which uses SSSD, and therefore, by this > > opinionated choice of the autorid backend, you create mappings that > > disagree with the supposed majority and the default. This will create > > problems in the future when you do consider NFS coexistence. > > > > Thanks, I'll keep that in mind. > > > Well, it's a different topic that most organizations that I have seen > > seem to ignore this default. Maybe those that don't have any problems > > don't have any reason to talk to me? I think that more research is > > needed here on whether RedHat's and GNOME's push of SSSD is something > > not-ready or indeed the de-facto standard setup. > > > > I think it's a bit of a mix, but am not sure either. > > > > Even if you don't want to use SSSD, providing an option to provision a > > few domains with idmap rid backend with statically configured ranges > > (as an override to autorid) would be a good step forward, as this can > > be made compatible with the default RedHat setup. > > That's reasonable. Thanks for the suggestion. > > > > > > > Sharing the same directories over both NFS and SMB at the same time, also > > > known as "multi-protocol", is not planned for now, however we're all aware > > > that there's often a demand for this feature and we're aware of the > > > complexity it brings. I expect we'll work on that at some point but not > > > initially. Similarly, sharing the same directories over a SMB share and > > > directly on a cephfs mount won't be blocked but we won't recommend it. > > > > OK. Feature request: in the case if there are several CephFS > > filesystems, support configuration of which one to serve. > > > > Putting it on the list. > > > > > In fact, I am quite skeptical, because, at least in my experience, > > > > every customer's SAMBA configuration as a domain member is a unique > > > > snowflake, and cephadm would need an ability to specify arbitrary UID > > > > mapping configuration to match what the customer uses elsewhere - and > > > > the match must be precise. > > > > > > I agree - our initial use case is something along the lines: > > > Users of a Ceph Cluster that have Windows systems, Mac systems, or > > > appliances that are joined to an existing AD > > > but are not currently interoperating with the Ceph cluster. > > > > > > I expect to add some idpapping configuration and agility down the line, > > > especially supporting some form of rfc2307 idmapping (where unix IDs are > > > stored in AD). > > > > Yes, for whatever reason, people do this, even though it is cumbersome > > to manage. > > > > > But those who already have idmapping schemes and samba accessing ceph will > > > probably need to just continue using the existing setups as we don't have > > > an immediate plan for migrating those users. > > > > > > > Here is what I have seen or was told about: > > > > > > > > 1. We don't care about interoperability with NFS or CephFS, so we just > > > > let SAMBA invent whatever UIDs and GIDs it needs using the "tdb2" > > > > idmap backend. It's completely OK that workstations get different UIDs > > > > and GIDs, as only SIDs traverse the wire. > > > > > > This is pretty close to our initial plan but I'm not clear why you'd think > > > that "workstations get different UIDs and GIDs". For all systems acessing > > > the (same) ceph cluster the id mapping should be consistent. > > > You did make me consider multi-cluster use cases with something like > > > cephfs > > > volume mirroring - that's something that I hadn't thought of before *but* > > > using an algorithmic mapping backend like autorid (and testing) I think > > > we're mostly OK there. > > > > The tdb2 backend (used in my example) is not algorithmic, it is > > allocating. That is, it sequentially allocates IDs on the > > first-seen-first-allocated basis. Yet this is what this customer uses, > > presumably because it is the only backend that explicitly specifies > > clustering operation in its manual page. > > > > And the "autorid" backend is also not fully algorithmic, it allocates > > ranges to domains on the same sequential basis (see > > https://github.com/samba-team/samba/blob/6fb98f70c6274e172787c8d5f73aa939201 > > 71e7c/source3/winbindd/idmap_autorid_tdb.c#L82), and therefore can create > > mismatching mappings if two workstations or servers have seen the users > > DOMA\usera and DOMB\userb in a different order. It is even mentioned in the > > manual page. SSSD largely avoids this problem by hashing the domain portion > > of the SID instead of > > allocating the subranges on a sequential basis. > > > > Agreed. Thanks for the reminder. This will certainly need to go on the test > plan. > > > > > 2. [not seen in the wild, the customer did not actually implement it, > > > > it's a product of internal miscommunication, and I am not sure if it > > > > is valid at all] We don't care about interoperability with CephFS, > > > > and, while we have NFS, security guys would not allow running NFS > > > > non-kerberized. Therefore, no UIDs or GIDs traverse the wire, only > > > > SIDs and names. Therefore, all we need is to allow both SAMBA and NFS > > > > to use shared UID mapping allocated on as-needed basis using the > > > > "tdb2" idmap module, and it doesn't matter that these UIDs and GIDs > > > > are inconsistent with what clients choose. > > > > > > Unfortunately, I don't really understand this item. Fortunately, you say > > > it > > > was only considered not implemented. :-) > > > > > > > 3. We don't care about ACLs at all, and don't care about CephFS > > > > interoperability. We set ownership of all new files to root:root 0666 > > > > using whatever options are available [well, I would rather use a > > > > dedicated nobody-style uid/gid here]. All we care about is that only > > > > authorized workstations or authorized users can connect to each NFS or > > > > SMB share, and we absolutely don't want them to be able to set custom > > > > ownership or ACLs. > > > > > > Some times known as the "drop-box" use case I think (not to be confused > > > with the cloud app of a similar name). > > > We could probably implement something like that as an option but I had not > > > considered it before. > > > > > > > 4. We care about NFS and CephFS file ownership being consistent with > > > > what Windows clients see. We store all UIDs and GIDs in Active > > > > Directory using the rfc2307 schema, and it's mandatory that all > > > > servers (especially SAMBA - thanks to the "ad" idmap backend) respect > > > > that and don't try to invent anything [well, they do - BUILTIN/Users > > > > gets its GID through tdb2]. Oh, and by the way, we have this strangely > > > > low-numbered group that everybody gets wrong unless they set "idmap > > > > config CORP : range = 500-999999". > > > > > > This is oh so similar to a project I worked on prior to working with Ceph. > > > I think we'll need to do this one eventually but maybe not this year. > > > One nice side-effect of running in containers is that the low-id number is > > > less of an issue because the ids only matter within the container context > > > (and only then if using the kernel file system access methods). We have > > > much more flexibility with IDs in a container. > > > > So - are you going to use the kernel-based mount or the ceph vfs > > module? My tests indicate that, in situations where there are > > frequently accessed files, allowing the kernel to cache them in RAM > > (which the vfs module does not do) can create a big boost in > > performance. Also, SUSE considers the ceph vfs module a > > non-recommended solution apparently for the same performance-related > > reason, see > > https://documentation.suse.com/ses/7/html/ses-all/cha-ses-cifs.html > > > The prototype module only uses the vfs module due to the extreme simplicity of > setting it up in containers. Otherwise, we're trying to keep our options open > and are investigating multiple approaches currently. > > > > > 5. We use a few static ranges for algorithmic ID translation using the > > > > idmap rid backend. Everything works. > > > > > > See above. > > > > > > > 6. We use SSSD, which provides consistent IDs everywhere, and for a > > > > few devices which can't use it, we configured compatible idmap rid > > > > ranges for use with winbindd. The only problem is that we like > > > > user-private groups, and only SSSD has support for them (although we > > > > admit it's our fault that we enabled this non-default option). > > > > 7. We store ID mappings in non-AD LDAP and use winbindd with the > > > > "ldap" idmap backend. > > > > > > For now, we're only planning to do idmapping with winbind and AD. We'd > > > probably only consider non-AD ldap and/or ssd if there was strong and loud > > > demand for it. > > > > See above. > > > > However, as I said, providing a way to use the "rid" backend with > > statically defined domains and ranges in addition to the default > > "autorid" backend would be, for me, a good-enough substitute for SSSD. > > > > Sounds reasonable. I've done it that way in a prior role too, so it's somewhat > familiar. Thanks! > > > > > I am sure other weird but valid setups exist - please extend the list > > > > if you can. > > > > > > > > Which of the above scenarios would be supportable without resorting to > > > > the old way of installing SAMBA manually alongside the cluster? > > > > > > I hope I covered the above with some inline replies. This was great food > > > for thought and at just the right level of technical detail. So thank you > > > very much for replying, this is exactly the kind of discussion I want to > > > have now where the design is still young and flexible. > > > > > > One other cool thing I plan on doing is supporting multiple samba > > > containers running on the same cluster (even the same node if I can > > > wrangle the network properly). So one could in fact have completely > > > different domain joins and/or configurations. While I wouldn't suggest > > > anyone run a whole lot of different configurations on the same cluster - > > > this idea already allows for some level of agility between schemes. Later > > > on we might be able to use that as a building block for migration tools, > > > either from an existing samba setup or between configurations. > > > > Multiple SAMBA containers are also good for high availability (with > > ctdb) or scale-out (with round-robin DNS). > > > > > Also, I plan on adding `global_custom_options` and `share_custom_options` > > > for special overrides for development, qa, and experimentation but those > > > are strongly within the "you break it, you bought it" realm. But these > > > could be used for experimenting with idmapping schemes without having > > > them all baked into the smb mgr module code. > > > > Great, thanks! > > Once again, thanks for the feedback. This discussion is very welcome! > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx