Re: [PATCH 0/6] RFC: introduce extended inode owner identifier v4

Dave Chinner <david@xxxxxxxxxxxxx> · Sat, 20 Feb 2010 10:31:55 +1100

On Fri, Feb 19, 2010 at 01:16:47PM +0300, Dmitry Monakhov wrote:
> Dave Chinner <david@xxxxxxxxxxxxx> writes:
> 
> > On Thu, Feb 18, 2010 at 07:45:24PM +0300, Dmitry Monakhov wrote:
> >> This is new generation of attempt to add extended inode identifier.
> >> In previous posts it was called tree_id, subtree_id, project_id.
> >> But after none of this was not good enough. I've refused project_id
> >> because it is well know XFS feature.
> >
> > Admins, users and developers of mangement tools are all going to
> > hate us if we introduce subtly different "project/directory quota
> > like" accounting to different filesystems with different
> > administration mechanisms.
> Seems what you right here.
> >
> > The fact that project quotas are already implemented in XFS is not a
> > valid reason for creating a new, slightly less functional,
> > incompatible implementation of the same feature in other
> > filesystems.
> >
> >> And my implementation is
> >> slightly different from it especially from user-space point of view.
> >
> > This is exactly my point - if a user has an ext4 filesystem and an
> > xfs filesystem then your proposal will result in them needing two
> > different mechanisms to manage the project/directory quotas on their
> > filesystems.  This result is not desirable from a system design
> > perspective.  Management of such a feature needs to be consistent
> > across all filesystem types - just like it is for user and group
> > quotas - and we already have a widely used and well tested
> > management interface that can be used to implement exactly what you
> > need.
> Not exactly. XFS  allow only subtree-like structure

Not true at all.  XFS allows an arbitrary distribution of files in a
given project - they are not restricted to subtrees. This isn't
widely used because it requires manually setting the project ID
after the file is created. e.g. create a backup tarball of a project
heirarchy in an external non-controlled directory, then change the
project ID of the tarball to the correct project ID so that the
backup is also accounted to the correct project...

For example, I'll create a new project (testproj) and subtree
(/mnt/xfs/foo) associated with the project, create a 25MB file
inside the subtree, show it being accounted, the copy it outside
the subtree, show it isn't accounted, then change the project ID
of the outside copy to testproj and show that it is accounted to
the testproj even though it is outside the subtree:

# mkfs.xfs -f /dev/ubd/1
[.....]
# mount -o prjquota /dev/ubd/1 /mnt/xfs
# mkdir /mnt/xfs/foo
#
#
# echo testproj:42 >> /etc/projid
# echo 42:/mnt/xfs/foo >> /etc/projects
# xfs_quota -x -c 'project -s testproj' /mnt/xfs
Setting up project testproj (path /mnt/xfs/foo)...
Processed 1 /etc/projects paths for project testproj
#
#
#
# xfs_quota -x -c 'limit -p bhard=1g testproj' /mnt/xfs
# xfs_quota -x -c print /mnt/xfs
Filesystem          Pathname
/mnt/xfs            /dev/ubd/1 (pquota)
/mnt/xfs/foo        /dev/ubd/1 (project 42, testproj)
# xfs_quota -x -c report /mnt/xfs
Project quota on /mnt/xfs (/dev/ubd/1)
                               Blocks
Project ID       Used       Soft       Hard    Warn/Grace
---------- --------------------------------------------------
testproj            0          0    1048576     00 [--------]

#
#
#
# dd if=/dev/zero of=foo/testfile bs=1024k count=25
25+0 records in
25+0 records out
26214400 bytes (26 MB) copied, 0.116102 s, 226 MB/s
# sudo xfs_quota -x -c report /mnt/xfs
Project quota on /mnt/xfs (/dev/ubd/1)
                               Blocks
Project ID       Used       Soft       Hard    Warn/Grace
---------- --------------------------------------------------
testproj        25600          0    1048576     00 [--------]

#
#
#
# cp foo/testfile .
# sync
# xfs_quota -x -c report /mnt/xfs
Project quota on /mnt/xfs (/dev/ubd/1)
                               Blocks
Project ID       Used       Soft       Hard    Warn/Grace
---------- --------------------------------------------------
testproj        25600          0    1048576     00 [--------]

#
#
#
# xfs_io -f -c "chproj 42" testfile
# xfs_quota -x -c report /mnt/xfs
Project quota on /mnt/xfs (/dev/ubd/1)
                               Blocks
Project ID       Used       Soft       Hard    Warn/Grace
---------- --------------------------------------------------
testproj        51200          0    1048576     00 [--------]

#

> (link, rename are restricted).

The EXDEV on rename behaviour is purely an implementation detail -
it makes quota accounting in XFS simple. i.e. rename returns EXDEV
so that a mv(1) will fall back to create/copy/unlink and that
automatically gets the quota accounting correct. That is, it didn't
require a complex extension of dquot handling in the rename
transaction to implement.  This one could be fixed, and a couple of
ppl have actually asked recently if it could be done because moving
a few TB of data between projects is time consuming.

However, hard links are a different matter. If you can clearly
determine how to hard link a file into multiple different projects
(dquots), then track and account for all the space used in a sane
manner, work out how to account for new or removed files in such a
hardlinked directory, etc, then you can allow hard links between
different subtrees.

For example, if you add a new file into such a hard linked
directory, who does it get accounted to? What happens if you then
move a multiple-hard linked file to a different subtree? If the
inode is accounted to all projects, then each of these filesystem
transactions requires updating an arbitrary (unbound) number of
dquots - this alone makes journal reservations for transactions a
nightmare to calculate and greatly increases the complexity of such
transactions.

Disallowing hard links between directories in different projects
makes these cans of worms go away - it is a very practical design
choice to make. However, it in no way results in XFS project quotas
being restricted to subtrees - it is a *change of project quota*
that triggers these behaviours.

> Personally I think what right restriction, but someone may
> want to have not subtree-like hierarchy. So this patch doesn't introduce
> any link/rename rules.

The link/rename behaviour of XFS does not prevent this type of usage
at all.

> If user want to restrict his tree it will use
> bindmount. IMHO it is more intuitive than XFS does.

XFS is not trying to implement bind mount -like restrictions. The
behaviour was carefully designed to allow project quota's to be
sanely implemented.

> But again you definitely right about feature_names/interfaces ambiguity 
> If we can create common interface it would be great. See later in 
> the mail.
> >
> >> In order to avoid ambiguity i've stopped at the "metagroup" term.
> >> I hope it is final name for the feature.
> >
> > I think "metagroup" is too abstract and will likely be confused with
> > group quotas by those that don't understand what it is. i.e it does
> > not convey any information about the bounds of the quota container
> > (unlike user, group, directory or project).
> Ok. Since we want common interface we should use well known "project_id"
> term.
> 
> I think we can try to unify it in following way:
> *User interface*
> As soon as i understand XFS manage projid via xfs_ioctl_setattr, 
> struct fsxattr. IMHO it is not good idea to make this interface common
> for all filesystems. Let's use standard i_op->setxattr/getxattr for
> this purpose. Let's name this xattr as "system.project_id".

That's fine by me. I'd much prefer that we used the xattr interface
for inode attributes instead of poking bits through fcntl or ioctls...

> And xfs may easily catch corresponding setxattr/getxatrr and translate
> it to it's ioctl interface, so both interfaces will be equal.
> At least xattr interface already supported by various utils (tar,
> rsync, etc).

Well, the point of the way XFS implements project quotas is that
utilities such as cp, mv, tar, rsync, etc do not need to know
anything about them - just like user/group quotas.

If we go down the xattr route, then these utilities can't be allowed
to copy these xattrs to new files; the filesystem has to create them
atomically with the new inodes so that they are accounted correctly.
If they are created non-atomically and the system crashes between
creating the file and applying the quota xattr, then you have an
inconsistency that only a quotacheck will pick up....

> *Link/Rename behavior*
>  Let's introduce two modes:
>  1) SHARED project hierarchy: without restrictions for link/renames

See above - I don't think "without restrictions" can be easily
implemented because of the complexity hard links introduce.

>  2) ISOLATED project hierarchy: Well known XFS (subtrees like)
>     link/rename rules
>  And support this two mode like this:
>  generic_fs)
>        SHARED: by default 
>        ISOLATED: via bindmount
>  XFS)

This is a change of behaviour from the existing XFS project quota
configurations as they do not require bind mounts at all.

I'm interested to know how you see this working when you have
multiple subtrees with the same project ID? Renaming and linking
between those subtrees is currently possible with XFS project IDs,
but adding bind mounts would cause EXDEV to be returned for these
operations. i.e. It seems to me that these subtrees are "shared" by
your definition, but the addition of bind mounts makes them
"isolated".

Or you want a part of a subtree to be moved to a different project
ID because it needs to be accounted separately?  e.g. a group gets
moved in the organisation heirarchy, so the bean counters want to
change the project ID on all their files so there space usage can be
billed to the new department. If bind mounts are involved, this
quickly becomes complex and unmaintainable. It's not something that
users can easily manage, especially compared to the current 'xfs_io
-c "chproj -R <projid>" /path/to/subtree' method of doing this.

----

IMO focusing on link/rename restrictions as the deciding factor in
defining the user interface is wrong. I started out by saying that
having different user interfaces for different filesystems is not
desirable. You've ended up trying to encode the differences you
assume exist into a new user interface instead.

I'll rephrase the question - what part of the existing XFS project
quota administration interface (i.e. /etc/projects, /etc/projid,  a
quota command to set up the initial tree, etc) is not sufficient for
your purposes of defining and managing subtrees?  If it is not
sufficient, what simple extensions can we add that will make it
sufficient? Once we've got the high level management interface
defined, everything else is just details. ;)

>        ISOLATED: by default, because this is expected semantics (no
>                  changes required)
>        SHARED: xfs may add "shared_project" mount feature to disable
>                isolation semantics. At least this gives user more
>                flexibility than before.
>  We have to document such difference. In order to avoid misbehavior.

> *VFS interface to project_id*
>  In order to make profit of project_id we have to make it visible to
>  vfs layer, and let quota and nfsd (any other users?) exploit this.
>  Let's use proposed per-sb aux_attributes table for this purpose.

Why go to that complexity? Just add a 32 bit proj_id identifier to
the struct inode. If it's supposed to be generic, then simply
implement it like user and group quotas are.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html