Hi Glauber, Thanks for you comments! On 05/31/2012 04:54 PM, Glauber Costa wrote: > On 05/30/2012 06:58 PM, jeff.liu@xxxxxxxxxx wrote: >> Hello All, >> >> According to glauber's comments regarding container disk quota, it >> should be binded to mount >> namespace rather than cgroup. >> >> Per my try out, it works just fine by combining with userland quota >> utilitly in this way. > that's great. > > I'll take a look at the patches. > > >> >> * Modify quotactl(2) to examine if the caller is invoked inside >> container. >> implemented by checking the quota device name("rootfs" for lxc >> guest) or current pid namespace >> is not the initial one, then do mount namespace quotactl if >> required, or goto >> the normal quotactl procedure. > > I dislike the use of "lxc" name. There is nothing lxc-specific in this, > this is namespace-specific. lxc is just one of the container solutions > out there, so let's keep it generic. I think I should forget all things regarding LXC, just treat it as a new quota feature with regard to namespace. >> >> * Also, I have not handle a couple of things for now. >> . I think the container quota should be isolated to Jan's fs/quota/ >> directory. >> . There are a dozens of helper routines at general quota, e.g, >> struct if_dqblk<-> struct fs_disk_quota converts. >> dquot space and inodes bill up. >> They can be refactored as shared routines to some extents. >> . quotastats(8) is not teached to aware container for now. >> >> Changes in quota userland utility: >> * Introduce a new quota format string "lxc" to all quota control >> utility, to >> let each utility know that the user want to run container quota >> control. e.g: >> quotacheck -cvugm -F "lxc" / >> quotaon -u -F "lxc" / >> .... >> >> * Currently, I manually created the underlying device(by editing cgroup >> device access list and running mknod /dev/sdaX x x) for the rootfs >> inside containers to let the cache mount points routine pass for >> executing quotacheck against the "/" directory. Actually, it can be >> omitted here. >> >> * Add a new quotaio_lxc.c[.h] for container quota IO, it basically >> same to >> VFS quotaio logic, I just hope to isolate container stuff here. >> >> Issues: >> * How to detect quotactl(2) is launched from container in a reasonable >> way. > > It's a system call. It is always called by a process. The process > belongs to a namespace. What else is needed? nothing now. :) > >> * Do we need to let container quota works for cgroup combine with >> unshare(1)? >> Now the patchset is mainly works for lxc guest. IMHO, it can be >> used outside >> guest if the user desired. In this case, the quota limits can take >> effort >> among different underlying file systems if they have exported quota >> billing >> routines. > > I still don't understand what is the business of cgroups here. If you > are attaching it to mount namespace, you can always infer the context > from the calling process. I still need to look at your patches, but I > believe that dropping the "feature" of manipulating this from outside of > the container will save you a lot of trouble. Yup, just treat it to be namespace specific, there is nothing need to consider with cgroup interface. > > Please note that a process can temporarily join a namespace with > setns(). So you can have a *utility* that does it from the outer world, > but the kernel has no business with that. As far as we're concerned, I > believe that you should always get your context from the current > namespace, and forbid any usage from outside. I'll more investigation for that. > >> * The hash table list defines(hash table size)for dquot caching for >> each type is >> referred to kernel/user.c, maybe its better to define an array >> separatly for >> performance optimizations. Of course, that's all depending on my >> current >> implementation is on the right road. :) >> >> * Container quota statistics, should them be calculated and exposed to >> /proc/fs/quota? If the underlying file system also enabled with >> quotas, they will be >> mixed up, so how about add a new proc file like "ns_quota" there? > No, this should be transferred to the process-specific proc and them > symlinked. Take a look at "/proc/self". > >> >> * Memory shrinks acquired from kswap. >> As all dquot are cached in memory, and if the user executing >> quotaoff, maybe >> I need to handle quota disable but still be kept at memory. >> Also, add another routine to disable and remove all quotas from >> memory to >> save memory directly. > > I didn't read your patches yet, so take it with a grain of salt here. > But I don't understand why you make this distinction of keeping it in > memory only. > > You could keep quota files outside of the container, and then bind mount > them to the current location in the setup-phase. I have tried to keep quota files outsides originally, but I changed my thoughts afterwards, because of three reasons at that time: 1) The quota files could be overwrote if the container's rootfs is located at the root directory of a storage partition, and this partition is mounted with quota limits enabled. 2) To deal with quota files, looks I have to tweak up quota_read()/quota_write(), assuming ext4, which are corresponding to ext4_quota_read()/ext4_quota_write(). 3) As mount namespace could be created and destroyed at any stage, it has no memory to recall which inodes are quota files. however, quota tools need to restore a few things from those files I remember. but can not recalled all of them for now. :( I'll do some check up to refresh my head in this point. Sure, considering that we can bind mount them at setup phase, the first concern could be ignored. Thanks, -Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html