Re: [v12 0/5] ext4: add project quota support

Alban Crequy <alban.crequy@xxxxxxxxx> · Sun, 12 Apr 2015 17:36:53 +0200

On 9 April 2015 at 17:14, Li Xi <pkuelelixi@xxxxxxxxx> wrote:
> The following patches propose an implementation of project quota
> support for ext4. A project is an aggregate of unrelated inodes
> which might scatter in different directories. Inodes that belong
> to the same project possess an identical identification i.e.
> 'project ID', just like every inode has its user/group
> identification. The following patches add project quota as
> supplement to the former uer/group quota types.
> (...)

Thanks for this work, I would like to use this for containers. I am
adding containers@xxxxxxxxxxxxxxxxxxxxxxxxxx in Cc.

To make sure I understand correctly, I will describe the configuration
I have in mind and hopefully someone can tell me if it makes sense.

Containers created by rkt (https://github.com/coreos/rkt) use an
overlay filesystem as root and the lowerdir/upperdir directories are
based on an ext4 filesystem outside of the container's reach. The
lowerdir is the base image, and several container instances can
potentially use the same lowerdir. Each container has its upperdir
containing their changes.

With your patch set, I could assign a different projid to the upperdir
of each container with a specific quota. Then it will limit how much
the container will be able to write. I don't know if the overlay's
workdir would need to have projid too.

When a quota warning is sent on netlink, it is received only in the
initial user namespace and the processes in a different user namespace
will not be able to receive the netlink warnings. The user will only
receive a warning through the control terminal.

Since rkt does not use user namespaces yet, a rkt container could
unfortunately receive quota warnings through netlink concerning the
host or other containers. Or is it restricted to init_net?

quotactl() can be used in a rkt container if the proccesses in the
container can guess somehow which block device is used by the
filesystem hosting the overlay's upperdir and if they can mknod it
somewhere. Usually, containers don't restrict mknod but just restrict
read-write access through the device cgroup. The read-write access is
irrelevant for quotactl(): quotactl() just check that the device node
exists and that it is not on a nodev mount. The nodev check does not
restrict containers here because they usually have a /dev mounted as
tmpfs without the nodev option.

Containers that don't use user namespaces (so no projid mapping) would
be able to query quotas for projid assigned to other containers
(unfortunately). They would be able to change the quota of other
containers if they are privileged enough to be given CAP_SYS_RESOURCE.

Containers using user namespaces would not be able to change any quota
config because they don't have CAP_SYS_RESOURCE in the init user
namespace. If they are configured with a proper projid mapping, they
would only be able to query the projid they are assigned (they could
guess which projid to query by looking at /proc/self/projid_map).

Do you know if someone is working on the documentation? It would be
nice if filesystems/quota.txt could say who can receive the quota
warnings on netlink (which namespace) and if it could give some
information about projid. But maybe this belong to the proc(5) and
user_namespaces(7) manpages as well.

Is there any suggestions how to allocate projid in userspace?
Something like /etc/subprojid similar to /etc/subuid?

Thanks!
Alban
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html