Re: [PATCH] A request to reserve a "tree id" field on ext[34] inodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andreas Dilger <adilger@xxxxxxx> writes:

> On 2009-11-17, at 06:04, Pavel Emelyanov wrote:
>> We have a proposal to implement a 2-level disk quota on ext3 and ext4.
>>
>> In two words - the aim is to have directories on ext3/4 partitions
>> which are limited by its disk usage and the number of inodes. Further
>> the plan is to allow configuring uid and gid quotas within them.
>>
>> The main usage of this is containers. When two or more of them are
>> located on one disk their roots will be marked with a unique tree id
>> and thus the disk consumption of each container will be limited. While
>> achieving this goal having an id of what tree an inode belongs to is
>> a key requirement.
>
> How do you handle files with multiple links, if they are located in
> different trees?  The inode would need to have multiple tree ids.
A short answer is "NO", inode can not belongs to multiple trees.
Containers has some non obvious specific. 
Each container isolated from another as much as possible. 
Container has its own root tree. This tree is exported inside
CT by numerous possible ways (name-space, virtual-stack-fs, chroot)

So container's root are independent tree or several trees.
usually they organized like follows /ct_root/CT_${ID}/${tree_content}
There are many reasons to keep this trees separate one from another
   - inode attr: 
     If inode has links in A n B trees. And A-user call chown() for
     this inode, then B's owner will be surprised.
     The only way to overcome this is to virtualize inode atributes
     (for each tree) which is madness IMHO.
   - checkpoint/restore/online-backup:
     This is like suspend resume for VM, but in this case only
     container's process are stopped(freezed) for some time. After CT's
     process are stopped we may create backup CT's tree without freezing
     FS as a whole.
As I already say there are many way to accomplish this task. But everyone
has strong disadvantages:
Virtual block devices(qemu-like): problems with consistency and performance
ext3/4 + stack-fs(unionfs/vzfs): Bad failure resistance. It is
        impossible to support jorunalling quota file on stack-fs level.
XFS with proj quota : Lack of quota file journalling. XFS itself
        (please dont balme me, but i'm really not huge XFS fan)

So the only way to implement journalled quota for containers is to
implement it on native fs level.

"Containers directory tree-id" assumptions:
(1) Tree id is embedded inside inode
(2) Tree id is inherent from parent dir
(3) Inode can not belongs to different directory trees

Default directory tree (with id == 0) has special meaning.
directory which belongs to default tree may contains roots of
other trees. Default tree is used for subtree manipulation.

->rename restriction:
  if (S_ISDIR(old_inode->i_mode)) {
      if ((new_dir->i_tree_id == 0) || /* move to default tree */
               (new_dir->i_tree_id == old_inode->i_tree_id)) /*same tree */
             goto good;
      return -EXDEV;
  } else {
      /* If entry have more than one link then it is bad idea to allow
         rename it to different (even if it's default tree) tree,
         because this result in rule (3) violation.
      if (old_inode->i_nlink > 1) && 
                    (new_dir->i_tree_id != old_inode->i_tree_id)
            return -EXDEV;
 }
->link restriction: /* Links may  belongs to only one tree */
   if(new_dir->i_tree_id != old_inode->i_tree_id)
            return -EXDEV;

>
> You can instead just store this data in an xattr (which will normally
> be stored in the inode, so no performance impact), and then you are
> free to store multiple values per inode.
Yes xattr is possible, but struct ext4_xattr_entry is so big plus 
space for attr_name ...., But we only want 4 bytes.
In fact i've made a proof of concept patch it contains all necessary
for tree quota support. I'll post it if you interesting.

>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux