On Thu, Jun 15, 2023 at 7:08 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > > On 6/14/23 20:34, Aleksandr Mikhalitsyn wrote: > > On Wed, Jun 14, 2023 at 3:53 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > > > > > > > > On 6/13/23 22:53, Gregory Farnum wrote: > > > > On Mon, Jun 12, 2023 at 6:43 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > > >> > > > >> On 6/9/23 18:12, Aleksandr Mikhalitsyn wrote: > > > >>> On Fri, Jun 9, 2023 at 12:00 PM Christian Brauner > > <brauner@xxxxxxxxxx> wrote: > > > >>>> On Fri, Jun 09, 2023 at 10:59:19AM +0200, Aleksandr Mikhalitsyn > > wrote: > > > >>>>> On Fri, Jun 9, 2023 at 3:57 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > > >>>>>> On 6/8/23 23:42, Alexander Mikhalitsyn wrote: > > > >>>>>>> Dear friends, > > > >>>>>>> > > > >>>>>>> This patchset was originally developed by Christian Brauner > > but I'll continue > > > >>>>>>> to push it forward. Christian allowed me to do that :) > > > >>>>>>> > > > >>>>>>> This feature is already actively used/tested with LXD/LXC > > project. > > > >>>>>>> > > > >>>>>>> Git tree (based on https://github.com/ceph/ceph-client.git > > master): > > > >>>>> Hi Xiubo! > > > >>>>> > > > >>>>>> Could you rebase these patches to 'testing' branch ? > > > >>>>> Will do in -v6. > > > >>>>> > > > >>>>>> And you still have missed several places, for example the > > following cases: > > > >>>>>> > > > >>>>>> > > > >>>>>> 1 269 fs/ceph/addr.c <<ceph_netfs_issue_op_inline>> > > > >>>>>> req = ceph_mdsc_create_request(mdsc, > > CEPH_MDS_OP_GETATTR, > > > >>>>>> mode); > > > >>>>> + > > > >>>>> > > > >>>>>> 2 389 fs/ceph/dir.c <<ceph_readdir>> > > > >>>>>> req = ceph_mdsc_create_request(mdsc, op, > > USE_AUTH_MDS); > > > >>>>> + > > > >>>>> > > > >>>>>> 3 789 fs/ceph/dir.c <<ceph_lookup>> > > > >>>>>> req = ceph_mdsc_create_request(mdsc, op, > > USE_ANY_MDS); > > > >>>>> We don't have an idmapping passed to lookup from the VFS > > layer. As I > > > >>>>> mentioned before, it's just impossible now. > > > >>>> ->lookup() doesn't deal with idmappings and really can't > > otherwise you > > > >>>> risk ending up with inode aliasing which is really not > > something you > > > >>>> want. IOW, you can't fill in inode->i_{g,u}id based on a mount's > > > >>>> idmapping as inode->i_{g,u}id absolutely needs to be a > > filesystem wide > > > >>>> value. So better not even risk exposing the idmapping in there > > at all. > > > >>> Thanks for adding, Christian! > > > >>> > > > >>> I agree, every time when we use an idmapping we need to be > > careful with > > > >>> what we map. AFAIU, inode->i_{g,u}id should be based on the > > filesystem > > > >>> idmapping (not mount), > > > >>> but in this case, Xiubo want's current_fs{u,g}id to be mapped > > > >>> according to an idmapping. > > > >>> Anyway, it's impossible at now and IMHO, until we don't have any > > > >>> practical use case where > > > >>> UID/GID-based path restriction is used in combination with idmapped > > > >>> mounts it's not worth to > > > >>> make such big changes in the VFS layer. > > > >>> > > > >>> May be I'm not right, but it seems like UID/GID-based path > > restriction > > > >>> is not a widespread > > > >>> feature and I can hardly imagine it to be used with the container > > > >>> workloads (for instance), > > > >>> because it will require to always keep in sync MDS permissions > > > >>> configuration with the > > > >>> possible UID/GID ranges on the client. It looks like a nightmare > > for sysadmin. > > > >>> It is useful when cephfs is used as an external storage on the > > host, but if you > > > >>> share cephfs with a few containers with different user > > namespaces idmapping... > > > >> Hmm, while this will break the MDS permission check in cephfs then in > > > >> lookup case. If we really couldn't support it we should make it to > > > >> escape the check anyway or some OPs may fail and won't work as > > expected. > > > > I don't pretend to know the details of the VFS (or even our linux > > > > client implementation), but I'm confused that this is apparently so > > > > hard. It looks to me like we currently always fill in the "caller_uid" > > > > with "from_kuid(&init_user_ns, req->r_cred->fsuid))". Is this actually > > > > valid to begin with? If it is, why can't the uid mapping be applied on > > > > that? > > > > > > > > As both the client and the server share authority over the inode's > > > > state (including things like mode bits and owners), and need to do > > > > permission checking, being able to tell the server the relevant actor > > > > is inherently necessary. We also let admins restrict keys to > > > > particular UID/GID combinations as they wish, and it's not the most > > > > popular feature but it does get deployed. I would really expect a user > > > > of UID mapping to be one of the *most* likely to employ such a > > > > facility...maybe not with containers, but certainly end-user homedirs > > > > and shared spaces. > > > > > > > > Disabling the MDS auth checks is really not an option. I guess we > > > > could require any user employing idmapping to not be uid-restricted, > > > > and set the anonymous UID (does that work, Xiubo, or was it the broken > > > > one? In which case we'd have to default to root?). But that seems a > > > > bit janky to me. > > > > > > Yeah, this also seems risky. > > > > > > Instead disabling the MDS auth checks there is another option, which is > > > we can prevent the kclient to be mounted or the idmapping to be > > > applied. But this still have issues, such as what if admins set the MDS > > > auth caps after idmap applied to the kclients ? > > > > Hi Xiubo, > > > > I thought about this too and came to the same conclusion, that UID/GID > > based > > restriction can be applied dynamically, so detecting it on mount-time > > helps not so much. > > > For this you please raise one PR to ceph first to support this, and in > the PR we can discuss more for the MDS auth caps. And after the PR > getting merged then in this patch series you need to check the > corresponding option or flag to determine whether could the idmap > mounting succeed. I'm sorry but I don't understand what we want to support here. Do we want to add some new ceph request that allows to check if UID/GID-based permissions are applied for a particular ceph client user? Thanks, Alex > > Thanks > > - Xiubo > > > > > > > > IMO there have 2 options: the best way is to fix this in VFS if > > > possible. Else to add one option to disable the corresponding MDS auth > > > caps in ceph if users want to support the idmap feature. > > > > Dear colleagues, > > Dear Xiubo, > > > > Let me try to summarize the previous discussions about cephfs idmapped > > mount support. > > > > This discussion about the need of caller's UID/GID mapping is started > > from the first > > version of this patchset in this [1] thread. Let'me quote Christian here: > > > Since the idmapping is a property of the mount and not a property of the > > > caller the caller's fs{g,u}id aren't mapped. What is mapped are the > > > inode's i{g,u}id when accessed from a particular mount. > > > > > > The fs{g,u}id are only ever mapped when a new filesystem object is > > > created. So if I have an idmapped mount that makes it so that files > > > owned by 1000 on-disk appear to be owned by uid 0 then a user with uid 0 > > > creating a new file will create files with uid 1000 on-disk when going > > > through that mount. For cephfs that'd be the uid we would be sending > > > with creation requests as I've currently written it. > > > > This is a key part of this discussion. Idmapped mounts is not a way to > > proxify > > caller's UID/GID, but idmapped mounts are designed to perform UID/GID > > mapping > > of inode's owner's UID/GID. Yes, these concepts look really-really > > close and from > > the first glance it looks like it's just an equivalent thing. But they > > are not. > > > > From my understanding, if someone wants to verify caller UID/GID then > > he should > > take an unmapped UID/GID and verify it. It's not important if the > > caller does something > > through an idmapped mount or not, from_kuid(&init_user_ns, > > req->r_cred->fsuid)) > > literally "UID of the caller in a root user namespace". But cephfs > > mount can be used > > from any user namespace (yes, cephfs can't be mounted in user > > namespaces, but it > > can be inherited during CLONE_NEWNS, or used as a detached mount with > > open_tree/move_mount). > > What I want to say by providing this example is that even now, without > > idmapped mounts > > we have kinda close problem, that UID/GID based restriction will be > > based on the host's (!), > > root user namespace, UID/GID-s even if the caller sits inside the user > > namespace. And we don't care, > > right? Why it's a problem with an idmapped mounts? If someone wants to > > control caller's UID/GID > > on the MDS side he just needs to take hosts UID/GIDs and use them in > > permission rules. That's it. > > > > Next point is that technically idmapped mounts don't break anything, > > if someone starts using > > idmapped mounts with UID/GID-based restrictions he will get -EACCESS. > > Why is this a problem? > > A user will check configuration, read the clarification in the > > documentation about idmapped mounts > > in cephfs and find a warning that these are not fully compatible > > things right now. > > > > IMHO, there is only one real problem (which makes UID/GID-based > > restrictions is not fully compatible with > > an idmapped mounts). Is that we have to map caller's UID/GID according > > to a mount idmapping when we > > creating a new inode (mknod, mkdir, symlink, open(O_CREAT)). But it's > > only because the caller's UID/GIDs are > > used as the owner's UID/GID for newly created inode. Ideally, we need > > to have two fields in ceph request, > > one for a caller's UID/GID and another one for inode owner UID/GID. > > But this requires cephfs protocol modification > > (yes, it's a bit painful. But global VFS changes are painful too!). As > > Christian pointed this is a reason why > > he went this way in the first patchset version. > > > > Maybe I'm not right, but both options to properly fix that VFS API > > changes or cephfs protocol modification > > are too expensive until we don't have a real requestors with a good > > use case for idmapped mounts + UID/GID > > based permissions. We already have a real and good use case for > > idmapped mounts in Cephfs for LXD/LXC. > > IMHO, it's better to move this thing forward step by step, because VFS > > API/cephfs protocol changes will > > take a really big amount of time and it's not obvious that it's worth > > it, moreover it's not even clear that VFS API > > change is the right way to deal with this problem. It seems to me that > > Cephfs protocol change seems like a > > more proper way here. At the same time I fully understand that you are > > not happy about this option. > > > > Just to conclude, we don't have any kind of cephfs degradation here, > > all users without idmapping will not be affected, > > all users who start using mount idmappings with cephfs will be aware > > of this limitation. > > > > [1] > > https://lore.kernel.org/all/20220105141023.vrrbfhti5apdvkz7@wittgenstein/ > > > > Kind regards, > > Alex > > > > > > > > Thanks > > > > > > - Xiubo > > > > > > > -Greg > > > > > > > >> @Greg > > > >> > > > >> For the lookup requests the idmapping couldn't get the mapped UID/GID > > > >> just like all the other requests, which is needed by the MDS > > permission > > > >> check. Is that okay to make it disable the check for this case ? I am > > > >> afraid this will break the MDS permssions logic. > > > >> > > > >> Any idea ? > > > >> > > > >> Thanks > > > >> > > > >> - Xiubo > > > >> > > > >> > > > >>> Kind regards, > > > >>> Alex > > > >>> > > > >