Re: [LSF/MM TOPIC] [ATTEND] Container disk quota and lseek(2) upon shared extents

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



  Hi Jeff,

On Wed 30-01-13 00:37:08, Jeff Liu wrote:
> On 01/29/2013 11:14 PM, Jan Kara wrote:
> >   Hello,
> > 
> > On Tue 29-01-13 22:44:24, Jeff Liu wrote:
> >> I'd like to discuss the following problems on LSF:
> >>
> >> - Container UID/GID quota support
> >> About more than half year ago, I have posted a patch set about support UID/GID
> >> quota inside containers:
> >> http://www.spinics.net/lists/linux-containers/msg25393.html
> >>
> >> However, I have to put it on ice at that time since this feature is depend on the
> >> user namespace.  Now I think it's time to bring it up because the user_ns was
> >> basically done on 3.8-rcX.
> >>
> >> Combine with user_ns, there would have a couple of issues need to be solved at first:
> >> 1) UID/GID mapping between global and containers quota files.
> >> On my previous implementation, the quotas are cached in memory that is truely can not
> >> be accepted at all,  I'll try to make it as usual with journalling quota support.
> >>  
> >> 2) To avoid modifying the quota tools, maybe we have to make quotas enabled all the
> >> time inside containers so that the end user would just set up quota limits or won't.
> >>
> >> 3) Embed container quota accounting related logic into the corresponding VFS quota
> >> routines and make it transparent for the outside file systems.  
> >   So now looking into your old submission, your main aim was to make
> > quota-tools work properly when run from inside a container, right?
> Right. 
> > Because quota enforcement works properly once user namespaces are in place. In fact
> > quota calls such as Q_GETQUOTA or Q_SETQUOTA work correctly as well with
> > user namespaces. UID/GID translation from namespace id space to the
> > global space and back is already happening. So what functionality are you
> > missing?
> So looks like there is no need to revisit it.:(
> Previously I found that we can not turn quota off insides containers without modifying
> the quota tools, I am not sure this sounds make sense or not, or is this a fair user
> requirements.  Anyway, I'll play with the user namespace with quota tools for further
> investigations. 
  So turning quotas on/off is a filesystem global action. As such it's hard
to make it work from containers when you don't have fs-per-container
setup... Implementing something like per-namespace quota enforcement (i.e.
only processes from a particular namespace will not be allowed to exceed
quota) might be reasonably possible though - you would just need to tweak
sb_has_quota_limits_enabled() function to take also current namespace into
account.

> >> - Introduce a new whence to lseek(2) to fetch the reflinked/sharing extents
> >>
> >> We have some user requests about showing the real disk footprint with OCFS2 reflinked
> >> or Btrfs cloned files.  I had written a shared-du utility based on du(1) for OCFS2 as
> >> this is the only file system with reflink supports at that time:
> >> https://oss.oracle.com/pipermail/ocfs2-devel/2010-September/007293.html
> >   But this is a though problem, isn't it? You have to minimally cache some
> > info about *every* file du(1) was called on so that you can check whether
> > two files share some extents or not. I'm not saying it isn't a useful
> > functionality, just I'd like to verify we are on the same page.
> Yes, from the user land, I have to cache the shared extents info, and
> iterate the cached item to examine if the next one to be cached is
> already exists or not.  If exits, increase the count number and check the
> next one...otherwise, cache it, and repeat this step again and again
> until all the files resides on the target partition/directories were
> checked.
  Yes, that's what I'd imagine.

> >> It based on FIEMAP ioctl(2) on the user space, and OCFS2 using FIEMAP_EXTENT_SHARED
> >> flag to indicate an extent is reflinked/cow when the internal OCFS2_EXT_REFCOUNTED
> >> flag is detected.
> >>
> >> Recently, I have started to implement this feature on Btrfs in a similar approach.
> >> Once it completed, the next thing is to teach upstream du(1) works for both file
> >> systems with a new command option.
> >>
> >> Still sounds nothing because we have FIEMAP...:( But consider the bad interface
> >> and error prone when I improving cp(1) through it for sparse files, it will extends
> >> the ugly tentacles of FIEMAP into du(1) again that the maintainer of coreutils(Jim, CC-ed)
> >> don't like it at all, and I also want to avoid if possible...
> >>
> >> How about if we add a new whence type to lseek(2) for this function?  lseek has very clear
> >> interface and works very well for SEEK_DATA/SEEK_HOLE, most likely could works fine for
> >> shared extents IMHO.
> >   Well, I can hardly imagine how such lseek(2) interface would look to be
> > useful for identifying shared extents among different files. Do you have
> > something particular in mind?
> lseek(2) is not used for identifying shared extents among files.  It
> would be improved and called to find out and return an desired extent
> which is reflinked or cloned with a particular whence, the underlying
> file system should be improved accordingly.
> 
> To say Btrfs, if we performed btrfs_ioctl_clone from source file A to
> target B, run du(1) against both files, it would show double space
> although only 1/2 space is really used/reserved upon COW.
> 
> If we can mark the cloned extents of file with a special flag(to say
> EXTENT_MAP_CLONED), then call lseek(fd, offset, SEEK_CLONE or ?), it
> would return the offset of a cloned extent which is equal or beyond the
> given offset, so we can find out all the cloned extents upon a file which
> would be used for the disk space accounting in user space tools.
  OK, but then you have to call FIEMAP anyway to find which blocks are
underlying the extent so that you can match that with cloned extents from
different files. Ah, and the advantage would be that you don't have to
cache *all* the extents but only those that are reported as reflinked. OK,
now I see.

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux