Re: [PATCH v3] xattr: use rbtree for simple_xattrs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 11, 2022 at 12:53:36PM +0100, Christian Brauner wrote:
> A while ago Vasily reported that it is possible to set a large number of
> xattrs on inodes of filesystems that make use of the simple xattr
> infrastructure. This includes all kernfs-based filesystems that support
> xattrs (e.g., cgroupfs) and tmpfs. Both cgroupfs and tmpfs can be
> mounted by unprivileged users in unprivileged containers and root in an
> unprivileged container can set an unrestricted number of security.*
> xattrs and privileged users can also set unlimited trusted.* xattrs. As
> there are apparently users that have a fairly large number of xattrs we
> should scale a bit better. Other xattrs such as user.* are restricted
> for kernfs-based instances to a fairly limited number.
> 
> Using a simple linked list protected by a spinlock used for set, get,
> and list operations doesn't scale well if users use a lot of xattrs even
> if it's not a crazy number. There's no need to bring in the big guns
> like rhashtables or rw semaphores for this. An rbtree with a rwlock, or
> limited rcu semanics and seqlock is enough.
> 
> It scales within the constraints we are working in. By far the most
> common operation is getting an xattr. Setting xattrs should be a
> moderately rare operation. And listxattr() often only happens when
> copying xattrs between files or with the filey. Holding a lock across
> listxattr() is unproblematic because it doesn't list the values of
> xattrs. It can only be used to list the names of all xattrs set on a
> file. And the number of xattr names that can be listed with listxattr()
> is limited to XATTR_LIST_MAX aka 65536 bytes. If a larger buffer is
> passed then vfs_listxattr() caps it to XATTR_LIST_MAX and if more xattr
> names are found it will return -EFBIG. In short, the maximum amount of
> memory that can be retrieved via listxattr() is limited.
> 
> Of course, the API is broken as documented on xattr(7) already. In the
> future we might want to address this but for now this is the world we
> live in and have lived for a long time. But it does indeed mean that
> once an application goes over XATTR_LIST_MAX limit of xattrs set on an
> inode it isn't possible to copy the file and include its xattrs in the
> copy unless the caller knows all xattrs or limits the copy of the xattrs
> to important ones it knows by name (At least for tmpfs, and kernfs-based
> filesystems. Other filesystems might provide ways of achieving this.).
> 
> Bonus of this port to rbtree+rwlock is that we shrink the memory
> consumption for users of the simple xattr infrastructure.
> 
> Also add proper kernel documentation to all the functions.
> A big thanks to Paul for his comments.
> 
> Cc: Vasily Averin <vvs@xxxxxxxxxx>
> Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
> Signed-off-by: Christian Brauner (Microsoft) <brauner@xxxxxxxxxx>
> ---
> 
> Notes:
>     In the v1 and v2 of this patch we used an rbtree which was protected by an
>     rcu+seqlock combination. During the discussion it became clear that there's
>     some valid concern how safe it is to combine rcu with rbtrees. While most of
>     the issues are highly unlikely to matter in practice the code here can be
>     reached by unprivileged users rather directly so we should not be adventurous.
>     Instead of the rcu+seqlock combination simply use an rwlock. This will scale
>     sufficiently as well and had been one of the implementations I considered and
>     wrote a little while ago. Thanks to Paul for some deeper insights into issues
>     associated with rcu and rbtrees!
>     
>     In addition to this patch I would like to propose that we restrict the number
>     of xattrs for the simple xattr infrastructure via XATTR_MAX_LIST bytes. In
>     other words, we restrict the number of xattrs for simple xattr filesystems to
>     the number of xattrs names that can be retrieved via listxattr(). That should
>     be about 2000 to 3000 xattrs per inode which is more than enough. We should
>     risk this and see if we get any regression reports from userswith this
>     approach.
>     
>     This should be as simple as adding a max_list member to struct simple_xattrs
>     and initialize it with XATTR_MAX_LIST. Set operations would then check against
>     this field whether the new xattr they are trying to set will fit and return
>     -EFBIG otherwise. I think that might be a good approach to get rid of the in
>     principle unbounded number of xattrs that can be set via the simple xattr
>     infrastructure. I think this is a regression risk worth taking.
>     
>     /* v2 */
>     Christian Brauner <brauner@xxxxxxxxxx>:
>     - Fix kernel doc.
>     - Remove accidental leftover union from previous iteration.
>     
>     /* v3 */
>     Port the whole thing to use a simple rwlock instead of rcu+seqlock.
>     
>     "Paul E. McKenney" <paulmck@xxxxxxxxxx>:
>     - Fix simple_xattr_add() by searching the correct slot in the rbtree first.
>     
>     Roman Gushchin <roman.gushchin@xxxxxxxxx>:
>     - Avoid calling strcmp() multiple times.
> 
>  fs/xattr.c            | 317 +++++++++++++++++++++++++++++++++---------
>  include/linux/xattr.h |  38 ++---
>  mm/shmem.c            |   2 +-
>  3 files changed, 260 insertions(+), 97 deletions(-)

Acked-by: Roman Gushchin <roman.gushchin@xxxxxxxxx>

Thanks!



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux