On Tue, Jun 13, 2023 at 02:46:02PM +0200, Aleksandr Mikhalitsyn wrote: > On Tue, Jun 13, 2023 at 3:43 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > > > > > On 6/9/23 18:12, Aleksandr Mikhalitsyn wrote: > > > On Fri, Jun 9, 2023 at 12:00 PM Christian Brauner <brauner@xxxxxxxxxx> wrote: > > >> On Fri, Jun 09, 2023 at 10:59:19AM +0200, Aleksandr Mikhalitsyn wrote: > > >>> On Fri, Jun 9, 2023 at 3:57 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > >>>> > > >>>> On 6/8/23 23:42, Alexander Mikhalitsyn wrote: > > >>>>> Dear friends, > > >>>>> > > >>>>> This patchset was originally developed by Christian Brauner but I'll continue > > >>>>> to push it forward. Christian allowed me to do that :) > > >>>>> > > >>>>> This feature is already actively used/tested with LXD/LXC project. > > >>>>> > > >>>>> Git tree (based on https://github.com/ceph/ceph-client.git master): > > >>> Hi Xiubo! > > >>> > > >>>> Could you rebase these patches to 'testing' branch ? > > >>> Will do in -v6. > > >>> > > >>>> And you still have missed several places, for example the following cases: > > >>>> > > >>>> > > >>>> 1 269 fs/ceph/addr.c <<ceph_netfs_issue_op_inline>> > > >>>> req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_GETATTR, > > >>>> mode); > > >>> + > > >>> > > >>>> 2 389 fs/ceph/dir.c <<ceph_readdir>> > > >>>> req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS); > > >>> + > > >>> > > >>>> 3 789 fs/ceph/dir.c <<ceph_lookup>> > > >>>> req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS); > > >>> We don't have an idmapping passed to lookup from the VFS layer. As I > > >>> mentioned before, it's just impossible now. > > >> ->lookup() doesn't deal with idmappings and really can't otherwise you > > >> risk ending up with inode aliasing which is really not something you > > >> want. IOW, you can't fill in inode->i_{g,u}id based on a mount's > > >> idmapping as inode->i_{g,u}id absolutely needs to be a filesystem wide > > >> value. So better not even risk exposing the idmapping in there at all. > > > Thanks for adding, Christian! > > > > > > I agree, every time when we use an idmapping we need to be careful with > > > what we map. AFAIU, inode->i_{g,u}id should be based on the filesystem > > > idmapping (not mount), > > > but in this case, Xiubo want's current_fs{u,g}id to be mapped > > > according to an idmapping. > > > Anyway, it's impossible at now and IMHO, until we don't have any > > > practical use case where > > > UID/GID-based path restriction is used in combination with idmapped > > > mounts it's not worth to > > > make such big changes in the VFS layer. > > > > > > May be I'm not right, but it seems like UID/GID-based path restriction > > > is not a widespread > > > feature and I can hardly imagine it to be used with the container > > > workloads (for instance), > > > because it will require to always keep in sync MDS permissions > > > configuration with the > > > possible UID/GID ranges on the client. It looks like a nightmare for sysadmin. > > > It is useful when cephfs is used as an external storage on the host, but if you > > > share cephfs with a few containers with different user namespaces idmapping... > > > > Hmm, while this will break the MDS permission check in cephfs then in > > lookup case. If we really couldn't support it we should make it to > > escape the check anyway or some OPs may fail and won't work as expected. > > Hi Xiubo! > > Disabling UID/GID checks on the MDS side looks reasonable. IMHO the > most important checks are: > - open > - mknod/mkdir/symlink/rename > and for these checks we already have an idmapping. > > Also, I want to add that it's a little bit unusual when permission > checks are done against the caller UID/GID. The server side permission checking based on the sender's fs{g,u}id is rather esoteric imho. So I would just disable it for idmapped mounts. > Usually, if we have opened a file descriptor and, for instance, passed > this file descriptor through a unix socket then > file descriptor holder will be able to use it in accordance with the > flags (O_RDONLY, O_RDWR, ...). > We also have ->f_cred on the struct file that contains credentials of > the file opener and permission checks are usually done > based on this. But in cephfs we are always using syscall caller's > credentials. It makes cephfs file descriptor "not transferable" > in terms of permission checks. Yeah, that's another good point.