On Fri, Oct 28, 2022 at 01:10:41PM +0200, Christian Brauner wrote: > Last cycle we've already made the interaction with idmapped mounts more > robust and type safe by introducing the vfs{g,u}id_t type. This cycle we > concluded the conversion and removed the legacy helpers. > > Currently we still pass around the plain namespace that was attached to > a mount. This is in general pretty convenient but it makes it easy to > conflate filesystem and mount namespaces and what different roles they > have to play. Especially for filesystem developers without much > experience in this area this is an easy source for bugs. > > Instead of passing the plain namespace we introduce a dedicated type > struct mnt_idmap and replace the pointer with a pointer to a struct > mnt_idmap. There are no semantic or size changes for the mount struct > caused by this. > > We then start converting all places aware of idmapped mounts to rely on > struct mnt_idmap. Once the conversion is done all helpers down to the > really low-level make_vfs{g,u}id() and from_vfs{g,u}id() will take a > struct mnt_idmap argument instead of two namespace arguments. This way > it becomes impossible to conflate the two removing and thus eliminating > the possibility of any bugs. Fwiw, I fixed some issues in that area a > while ago in ntfs3 and ksmbd in the past. Afterwards only low-level code > can ultimately use the associated namespace for any permission checks. > Even most of the vfs can be completely obivious about this ultimately > and filesystems will never interact with it in any form in the future. > > A struct mnt_idmap currently encompasses a simple refcount a pointer to > the relevant namespace the mount is idmapped to. If a mount isn't > idmapped then it will point to a static nop_mnt_idmap and if it doesn't > that it is idmapped. As usual there are no allocations or anything > happening for non-idmapped mounts. Everthing is carefully written to be > a nop for non-idmapped mounts as has always been the case. > > If an idmapped mount is created a struct mnt_idmap is allocated and a > reference taken on the relevant namespace. Each mount that gets idmapped > or inherits the idmap simply bumps the reference count on struct > mnt_idmap. Just a reminder that we only allow a mount to change it's > idmapping a single time and only if it hasn't already been attached to > the filesystems and has no active writers. > > The actual changes are fairly straightforward but this will have huge > benefits for maintenance and security in the long run even if it causes > some churn. I'm aware that there's some cost for all of you. And I'll > commit to doing this work and make this as painless as I can. > > Note that this also makes it possible to extend struct mount_idmap in > the future. For example, it would be possible to place the namespace > pointer in an anonymous union together with an idmapping struct. This > would allow us to expose an api to userspace that would let it specify > idmappings directly instead of having to go through the detour of > setting up namespaces at all. > > This adds the infrastructure. > > Signed-off-by: Christian Brauner (Microsoft) <brauner@xxxxxxxxxx> I like this for avoiding confusion about which namespace is which, and for the clarity provided by nop_mnt_idmapping. Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@xxxxxxxxxx>