Re: [PATCH v5 00/14] ceph: support idmapped mounts

Aleksandr Mikhalitsyn <aleksandr.mikhalitsyn@xxxxxxxxxxxxx> · Thu, 15 Jun 2023 13:05:41 +0200

On Thu, Jun 15, 2023 at 7:08 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
>
>
> On 6/14/23 20:34, Aleksandr Mikhalitsyn wrote:
> > On Wed, Jun 14, 2023 at 3:53 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
> > >
> > >
> > > On 6/13/23 22:53, Gregory Farnum wrote:
> > > > On Mon, Jun 12, 2023 at 6:43 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
> > > >>
> > > >> On 6/9/23 18:12, Aleksandr Mikhalitsyn wrote:
> > > >>> On Fri, Jun 9, 2023 at 12:00 PM Christian Brauner
> > <brauner@xxxxxxxxxx> wrote:
> > > >>>> On Fri, Jun 09, 2023 at 10:59:19AM +0200, Aleksandr Mikhalitsyn
> > wrote:
> > > >>>>> On Fri, Jun 9, 2023 at 3:57 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
> > > >>>>>> On 6/8/23 23:42, Alexander Mikhalitsyn wrote:
> > > >>>>>>> Dear friends,
> > > >>>>>>>
> > > >>>>>>> This patchset was originally developed by Christian Brauner
> > but I'll continue
> > > >>>>>>> to push it forward. Christian allowed me to do that :)
> > > >>>>>>>
> > > >>>>>>> This feature is already actively used/tested with LXD/LXC
> > project.
> > > >>>>>>>
> > > >>>>>>> Git tree (based on https://github.com/ceph/ceph-client.git
> > master):
> > > >>>>> Hi Xiubo!
> > > >>>>>
> > > >>>>>> Could you rebase these patches to 'testing' branch ?
> > > >>>>> Will do in -v6.
> > > >>>>>
> > > >>>>>> And you still have missed several places, for example the
> > following cases:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>       1    269  fs/ceph/addr.c <<ceph_netfs_issue_op_inline>>
> > > >>>>>>                 req = ceph_mdsc_create_request(mdsc,
> > CEPH_MDS_OP_GETATTR,
> > > >>>>>> mode);
> > > >>>>> +
> > > >>>>>
> > > >>>>>>       2    389  fs/ceph/dir.c <<ceph_readdir>>
> > > >>>>>>                 req = ceph_mdsc_create_request(mdsc, op,
> > USE_AUTH_MDS);
> > > >>>>> +
> > > >>>>>
> > > >>>>>>       3    789  fs/ceph/dir.c <<ceph_lookup>>
> > > >>>>>>                 req = ceph_mdsc_create_request(mdsc, op,
> > USE_ANY_MDS);
> > > >>>>> We don't have an idmapping passed to lookup from the VFS
> > layer. As I
> > > >>>>> mentioned before, it's just impossible now.
> > > >>>> ->lookup() doesn't deal with idmappings and really can't
> > otherwise you
> > > >>>> risk ending up with inode aliasing which is really not
> > something you
> > > >>>> want. IOW, you can't fill in inode->i_{g,u}id based on a mount's
> > > >>>> idmapping as inode->i_{g,u}id absolutely needs to be a
> > filesystem wide
> > > >>>> value. So better not even risk exposing the idmapping in there
> > at all.
> > > >>> Thanks for adding, Christian!
> > > >>>
> > > >>> I agree, every time when we use an idmapping we need to be
> > careful with
> > > >>> what we map. AFAIU, inode->i_{g,u}id should be based on the
> > filesystem
> > > >>> idmapping (not mount),
> > > >>> but in this case, Xiubo want's current_fs{u,g}id to be mapped
> > > >>> according to an idmapping.
> > > >>> Anyway, it's impossible at now and IMHO, until we don't have any
> > > >>> practical use case where
> > > >>> UID/GID-based path restriction is used in combination with idmapped
> > > >>> mounts it's not worth to
> > > >>> make such big changes in the VFS layer.
> > > >>>
> > > >>> May be I'm not right, but it seems like UID/GID-based path
> > restriction
> > > >>> is not a widespread
> > > >>> feature and I can hardly imagine it to be used with the container
> > > >>> workloads (for instance),
> > > >>> because it will require to always keep in sync MDS permissions
> > > >>> configuration with the
> > > >>> possible UID/GID ranges on the client. It looks like a nightmare
> > for sysadmin.
> > > >>> It is useful when cephfs is used as an external storage on the
> > host, but if you
> > > >>> share cephfs with a few containers with different user
> > namespaces idmapping...
> > > >> Hmm, while this will break the MDS permission check in cephfs then in
> > > >> lookup case. If we really couldn't support it we should make it to
> > > >> escape the check anyway or some OPs may fail and won't work as
> > expected.
> > > > I don't pretend to know the details of the VFS (or even our linux
> > > > client implementation), but I'm confused that this is apparently so
> > > > hard. It looks to me like we currently always fill in the "caller_uid"
> > > > with "from_kuid(&init_user_ns, req->r_cred->fsuid))". Is this actually
> > > > valid to begin with? If it is, why can't the uid mapping be applied on
> > > > that?
> > > >
> > > > As both the client and the server share authority over the inode's
> > > > state (including things like mode bits and owners), and need to do
> > > > permission checking, being able to tell the server the relevant actor
> > > > is inherently necessary. We also let admins restrict keys to
> > > > particular UID/GID combinations as they wish, and it's not the most
> > > > popular feature but it does get deployed. I would really expect a user
> > > > of UID mapping to be one of the *most* likely to employ such a
> > > > facility...maybe not with containers, but certainly end-user homedirs
> > > > and shared spaces.
> > > >
> > > > Disabling the MDS auth checks is really not an option. I guess we
> > > > could require any user employing idmapping to not be uid-restricted,
> > > > and set the anonymous UID (does that work, Xiubo, or was it the broken
> > > > one? In which case we'd have to default to root?). But that seems a
> > > > bit janky to me.
> > >
> > > Yeah, this also seems risky.
> > >
> > > Instead disabling the MDS auth checks there is another option, which is
> > > we can prevent  the kclient to be mounted or the idmapping to be
> > > applied. But this still have issues, such as what if admins set the MDS
> > > auth caps after idmap applied to the kclients ?
> >
> > Hi Xiubo,
> >
> > I thought about this too and came to the same conclusion, that UID/GID
> > based
> > restriction can be applied dynamically, so detecting it on mount-time
> > helps not so much.
> >
> For this you please raise one PR to ceph first to support this, and in
> the PR we can discuss more for the MDS auth caps. And after the PR
> getting merged then in this patch series you need to check the
> corresponding option or flag to determine whether could the idmap
> mounting succeed.

I'm sorry but I don't understand what we want to support here. Do we want to
add some new ceph request that allows to check if UID/GID-based
permissions are applied for
a particular ceph client user?

Thanks,
Alex

>
> Thanks
>
> - Xiubo
>
>
> > >
> > > IMO there have 2 options: the best way is to fix this in VFS if
> > > possible. Else to add one option to disable the corresponding MDS auth
> > > caps in ceph if users want to support the idmap feature.
> >
> > Dear colleagues,
> > Dear Xiubo,
> >
> > Let me try to summarize the previous discussions about cephfs idmapped
> > mount support.
> >
> > This discussion about the need of caller's UID/GID mapping is started
> > from the first
> > version of this patchset in this [1] thread. Let'me quote Christian here:
> > > Since the idmapping is a property of the mount and not a property of the
> > > caller the caller's fs{g,u}id aren't mapped. What is mapped are the
> > > inode's i{g,u}id when accessed from a particular mount.
> > >
> > > The fs{g,u}id are only ever mapped when a new filesystem object is
> > > created. So if I have an idmapped mount that makes it so that files
> > > owned by 1000 on-disk appear to be owned by uid 0 then a user with uid 0
> > > creating a new file will create files with uid 1000 on-disk when going
> > > through that mount. For cephfs that'd be the uid we would be sending
> > > with creation requests as I've currently written it.
> >
> > This is a key part of this discussion. Idmapped mounts is not a way to
> > proxify
> > caller's UID/GID, but idmapped mounts are designed to perform UID/GID
> > mapping
> > of inode's owner's UID/GID. Yes, these concepts look really-really
> > close and from
> > the first glance it looks like it's just an equivalent thing. But they
> > are not.
> >
> > From my understanding, if someone wants to verify caller UID/GID then
> > he should
> > take an unmapped UID/GID and verify it. It's not important if the
> > caller does something
> > through an idmapped mount or not, from_kuid(&init_user_ns,
> > req->r_cred->fsuid))
> > literally "UID of the caller in a root user namespace". But cephfs
> > mount can be used
> > from any user namespace (yes, cephfs can't be mounted in user
> > namespaces, but it
> > can be inherited during CLONE_NEWNS, or used as a detached mount with
> > open_tree/move_mount).
> > What I want to say by providing this example is that even now, without
> > idmapped mounts
> > we have kinda close problem, that UID/GID based restriction will be
> > based on the host's (!),
> > root user namespace, UID/GID-s even if the caller sits inside the user
> > namespace. And we don't care,
> > right? Why it's a problem with an idmapped mounts? If someone wants to
> > control caller's UID/GID
> > on the MDS side he just needs to take hosts UID/GIDs and use them in
> > permission rules. That's it.
> >
> > Next point is that technically idmapped mounts don't break anything,
> > if someone starts using
> > idmapped mounts with UID/GID-based restrictions he will get -EACCESS.
> > Why is this a problem?
> > A user will check configuration, read the clarification in the
> > documentation about idmapped mounts
> > in cephfs and find a warning that these are not fully compatible
> > things right now.
> >
> > IMHO, there is only one real problem (which makes UID/GID-based
> > restrictions is not fully compatible with
> > an idmapped mounts). Is that we have to map caller's UID/GID according
> > to a mount idmapping when we
> > creating a new inode (mknod, mkdir, symlink, open(O_CREAT)). But it's
> > only because the caller's UID/GIDs are
> > used as the owner's UID/GID for newly created inode. Ideally, we need
> > to have two fields in ceph request,
> > one for a caller's UID/GID and another one for inode owner UID/GID.
> > But this requires cephfs protocol modification
> > (yes, it's a bit painful. But global VFS changes are painful too!). As
> > Christian pointed this is a reason why
> > he went this way in the first patchset version.
> >
> > Maybe I'm not right, but both options to properly fix that VFS API
> > changes or cephfs protocol modification
> > are too expensive until we don't have a real requestors with a good
> > use case for idmapped mounts + UID/GID
> > based permissions. We already have a real and good use case for
> > idmapped mounts in Cephfs for LXD/LXC.
> > IMHO, it's better to move this thing forward step by step, because VFS
> > API/cephfs protocol changes will
> > take a really big amount of time and it's not obvious that it's worth
> > it, moreover it's not even clear that VFS API
> > change is the right way to deal with this problem. It seems to me that
> > Cephfs protocol change seems like a
> > more proper way here. At the same time I fully understand that you are
> > not happy about this option.
> >
> > Just to conclude, we don't have any kind of cephfs degradation here,
> > all users without idmapping will not be affected,
> > all users who start using mount idmappings with cephfs will be aware
> > of this limitation.
> >
> > [1]
> > https://lore.kernel.org/all/20220105141023.vrrbfhti5apdvkz7@wittgenstein/
> >
> > Kind regards,
> > Alex
> >
> > >
> > > Thanks
> > >
> > > - Xiubo
> > >
> > > > -Greg
> > > >
> > > >> @Greg
> > > >>
> > > >> For the lookup requests the idmapping couldn't get the mapped UID/GID
> > > >> just like all the other requests, which is needed by the MDS
> > permission
> > > >> check. Is that okay to make it disable the check for this case ? I am
> > > >> afraid this will break the MDS permssions logic.
> > > >>
> > > >> Any idea ?
> > > >>
> > > >> Thanks
> > > >>
> > > >> - Xiubo
> > > >>
> > > >>
> > > >>> Kind regards,
> > > >>> Alex
> > > >>>
> > >
>