On Tue, Jun 13, 2023 at 4:54 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > > On Mon, Jun 12, 2023 at 6:43 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > > > > > On 6/9/23 18:12, Aleksandr Mikhalitsyn wrote: > > > On Fri, Jun 9, 2023 at 12:00 PM Christian Brauner <brauner@xxxxxxxxxx> wrote: > > >> On Fri, Jun 09, 2023 at 10:59:19AM +0200, Aleksandr Mikhalitsyn wrote: > > >>> On Fri, Jun 9, 2023 at 3:57 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > >>>> > > >>>> On 6/8/23 23:42, Alexander Mikhalitsyn wrote: > > >>>>> Dear friends, > > >>>>> > > >>>>> This patchset was originally developed by Christian Brauner but I'll continue > > >>>>> to push it forward. Christian allowed me to do that :) > > >>>>> > > >>>>> This feature is already actively used/tested with LXD/LXC project. > > >>>>> > > >>>>> Git tree (based on https://github.com/ceph/ceph-client.git master): > > >>> Hi Xiubo! > > >>> > > >>>> Could you rebase these patches to 'testing' branch ? > > >>> Will do in -v6. > > >>> > > >>>> And you still have missed several places, for example the following cases: > > >>>> > > >>>> > > >>>> 1 269 fs/ceph/addr.c <<ceph_netfs_issue_op_inline>> > > >>>> req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_GETATTR, > > >>>> mode); > > >>> + > > >>> > > >>>> 2 389 fs/ceph/dir.c <<ceph_readdir>> > > >>>> req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS); > > >>> + > > >>> > > >>>> 3 789 fs/ceph/dir.c <<ceph_lookup>> > > >>>> req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS); > > >>> We don't have an idmapping passed to lookup from the VFS layer. As I > > >>> mentioned before, it's just impossible now. > > >> ->lookup() doesn't deal with idmappings and really can't otherwise you > > >> risk ending up with inode aliasing which is really not something you > > >> want. IOW, you can't fill in inode->i_{g,u}id based on a mount's > > >> idmapping as inode->i_{g,u}id absolutely needs to be a filesystem wide > > >> value. So better not even risk exposing the idmapping in there at all. > > > Thanks for adding, Christian! > > > > > > I agree, every time when we use an idmapping we need to be careful with > > > what we map. AFAIU, inode->i_{g,u}id should be based on the filesystem > > > idmapping (not mount), > > > but in this case, Xiubo want's current_fs{u,g}id to be mapped > > > according to an idmapping. > > > Anyway, it's impossible at now and IMHO, until we don't have any > > > practical use case where > > > UID/GID-based path restriction is used in combination with idmapped > > > mounts it's not worth to > > > make such big changes in the VFS layer. > > > > > > May be I'm not right, but it seems like UID/GID-based path restriction > > > is not a widespread > > > feature and I can hardly imagine it to be used with the container > > > workloads (for instance), > > > because it will require to always keep in sync MDS permissions > > > configuration with the > > > possible UID/GID ranges on the client. It looks like a nightmare for sysadmin. > > > It is useful when cephfs is used as an external storage on the host, but if you > > > share cephfs with a few containers with different user namespaces idmapping... > > > > Hmm, while this will break the MDS permission check in cephfs then in > > lookup case. If we really couldn't support it we should make it to > > escape the check anyway or some OPs may fail and won't work as expected. Dear Gregory, Thanks for the fast reply! > > I don't pretend to know the details of the VFS (or even our linux > client implementation), but I'm confused that this is apparently so > hard. It looks to me like we currently always fill in the "caller_uid" > with "from_kuid(&init_user_ns, req->r_cred->fsuid))". Is this actually > valid to begin with? If it is, why can't the uid mapping be applied on > that? Applying an idmapping is not hard, it's as simple as replacing from_kuid(&init_user_ns, req->r_cred->fsuid) to from_vfsuid(req->r_mnt_idmap, &init_user_ns, VFSUIDT_INIT(req->r_cred->fsuid)) but the problem is that we don't have req->r_mnt_idmap for all the requests. For instance, we don't have idmap arguments (that come from the VFS layer) for ->lookup operation and many others. There are some reasons for that (Christian has covered some of them). So, it's not about my laziness to implement that. It's a real pain ;-) > > As both the client and the server share authority over the inode's > state (including things like mode bits and owners), and need to do > permission checking, being able to tell the server the relevant actor > is inherently necessary. We also let admins restrict keys to > particular UID/GID combinations as they wish, and it's not the most > popular feature but it does get deployed. I would really expect a user > of UID mapping to be one of the *most* likely to employ such a > facility...maybe not with containers, but certainly end-user homedirs > and shared spaces. > > Disabling the MDS auth checks is really not an option. I guess we > could require any user employing idmapping to not be uid-restricted, > and set the anonymous UID (does that work, Xiubo, or was it the broken > one? In which case we'd have to default to root?). But that seems a > bit janky to me. That's an interesting point about anonymous UID, but at the same time, We use these caller's fs UID/GID values as an owner's UID/GID for newly created inodes. It means that we can't use anonymous UID everywhere in this case otherwise all new files/directories will be owned by an anonymous user. > -Greg Kind regards, Alex > > > @Greg > > > > For the lookup requests the idmapping couldn't get the mapped UID/GID > > just like all the other requests, which is needed by the MDS permission > > check. Is that okay to make it disable the check for this case ? I am > > afraid this will break the MDS permssions logic. > > > > Any idea ? > > > > Thanks > > > > - Xiubo > > > > > > > Kind regards, > > > Alex > > > > > >