在 2018年4月1日,下午3:00,Amir Goldstein <amir73il@xxxxxxxxx> 写道: > > On Sun, Apr 1, 2018 at 9:53 AM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: >> On Sun, Apr 1, 2018 at 9:35 AM, <cgxu519@xxxxxxx> wrote: >>> 在 2018年3月10日,下午2:37,Chengguang Xu <cgxu519@xxxxxxx> 写道: >>>> >>>>> Sent: Thursday, March 08, 2018 at 10:36 PM >>>>> From: "Amir Goldstein" <amir73il@xxxxxxxxx> >>>>> To: "Chengguang Xu" <cgxu519@xxxxxxx> >>>>> Cc: overlayfs <linux-unionfs@xxxxxxxxxxxxxxx>, "Miklos Szeredi" <miklos@xxxxxxxxxx> >>>>> Subject: Re: "quota-df" feature >>>>> >>>>> On Thu, Mar 8, 2018 at 3:10 PM, Chengguang Xu <cgxu519@xxxxxxx> wrote: >>>>> [...] >>>>>>>>>> Lowerdirs could be share with different overlayfs, so I'm not sure counting >>>>>>>>>> the contents of lowerdirs is proper behavior or not. If lowerdirs are dedicated >>>>>>>>>> to one overlayfs then maybe setting same project id with upperdir can resolve >>>>>>>>>> "covered" problem that you mentioned above. Counting usage information without >>>>>>>>>> quota might be a hard work when having plenty of files. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Well, its a different use case, but a very well known use case - >>>>>>>>> When dealing with cloned files (e.g. btrfs, xfs) many files can share the same >>>>>>>>> blocks, but each clone is fully accounted to the user/group/project. >>>>>>>>> This is the "thin provisioning" use case - every user gets accounted by files >>>>>>>>> that the user can reference, but the host does not pay the cost of sum of all >>>>>>>>> user quotas. >>>>>>>>> >>>>>>>>> Without accounting of "covered" files to begin with, is it possible to >>>>>>>>> get to a state >>>>>>>>> where 'touch' on a big file gets ENOSPC/EQUOTA. This is indeed a situation that >>>>>>>>> can happen in "thin provisioned" filesystems (e.g. btrfs) or on thin >>>>>>>>> provisioned block, >>>>>>>>> but a situation that filesystems and administrators try really hard to avoid. >>>>>>>> >>>>>>>> Let me make clear about the term of "covered", is it meaning hidden file in lowerdir >>>>>>>> because of same named file exists in upperdir? or is it meaning the contents in >>>>>>>> lowerdir but in the merged dir of overlayfs? >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> The former. "covered" file size does not show up in du -s, so the >>>>>>> merged disk usage >>>>>>> is <upper used> + <lower used> - <covered used> >>>>>> >>>>>> Thanks, got it. >>>>>> But IIUC, "covered" file does not have chance to copy-up, so >>>>>> I'm wondering is it the real reason for getting ENOSPC/EQUOTA error? >>>>> >>>>> Suppose your "image" (i.e. total disk usage of lower) is 1GB >>>>> and you want to allow user to touch all the files in the image and >>>>> create 1GB of new files. >>>>> >>>>> If your only tool is project quota on upper then you need to set project >>>>> quota hard limit to 2GB, but then user can create 2GB of new files and >>>>> later when touching a lower file, will get EQUOTA on copy up. >>>>> >>>>> If you account lower uncovered files to overlay merged quota then >>>>> you set the merged quota to 2GB and start with 50% used. >>>>> - Copy up will not change used >>>>> - Remove of lower will reduce used >>>>> - You can never get EQUOTA from touching a file >>>> >>>> I get your point, I'd like to think it as "reservation" feature we can implement >>>> in overlay for uncovered files, so that we can get rid of EQUOTA error during copy-up. >>>> Even might get rid of ENOSPC error during copy-up when underlying filesystem supports >>>> block reservation? Are there many people hope to have this kind of function? >>>> >>>> >>>>>>>>> All I am saying is that it is "not hard" (TM) to keep track of >>>>>>>>> "covered" files disk usage >>>>>>>>> and "not hard" to re-calculate "covered" files disk usage when full >>>>>>>>> indexing is enabled. >>>>>>>>>> I don't know exactly what will happen when combining index and nfs_export options, >>>>>>>>>> I need to read and understand related code later. >>>>>>>>>> >>>>>>>>> >>>>>>>>> nfs_export REQUIRES index and implies indexing of all files on copy up, not >>>>>>>>> only lower hardlinks. >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Here are some test examples share with you (based on ext4): >>>>>>>>>>>> >>>>>>>>>>>> 1) project quota enabled && without hard-limit >>>>>>>>>>>> >>>>>>>>>>>> $ df -h /mnt/test3 /mnt/test3/df/merged >>>>>>>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>>>>>>> /dev/vdb3 99G 2.5G 91G 3% /mnt/test3 >>>>>>>>>>>> overlay 92G 201M 91G 1% /mnt/test3/df/merged >>>>>>>>>>>> >>>>>>>>>>>> $ df -hi /mnt/test3 /mnt/test3/df/merged >>>>>>>>>>>> Filesystem Inodes IUsed IFree IUse% Mounted on >>>>>>>>>>>> /dev/vdb3 6.3M 2.4K 6.3M 1% /mnt/test3 >>>>>>>>>>>> overlay 6.3M 8 6.3M 1% /mnt/test3/df/merged >>>>>>>>>>>> >>>>>>>>>>>> 2) project quota enabled && with hard-limit >>>>>>>>>>>> >>>>>>>>>>>> $ df -h /mnt/test3 /mnt/test3/df/merged >>>>>>>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>>>>>>> /dev/vdb3 99G 2.5G 91G 3% /mnt/test3 >>>>>>>>>>>> overlay 1.0G 201M 824M 20% /mnt/test3/df/merged >>>>>>>>>>>> >>>>>>>>>>>> $ df -hi /mnt/test3 /mnt/test3/df/merged >>>>>>>>>>>> Filesystem Inodes IUsed IFree IUse% Mounted on >>>>>>>>>>>> /dev/vdb3 6.3M 2.4K 6.3M 1% /mnt/test3 >>>>>>>>>>>> overlay 1000 8 992 1% /mnt/test3/df/merged >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I can't follow from the example below what is the expected result and why >>>>>>>>>>> if you add the quota setup commands that could be useful. >>>>>>>>>> >>>>>>>>>> Underlying fs using ext4 and mkfs/mount with project quota option and >>>>>>>>>> all needed directories(lowerdir, upperdir, workdir, merged) are set same >>>>>>>>>> project quota. Current quota-df implementation only adjust the couting >>>>>>>>>> information when having upperdir. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Really? why is lowerdir on the same project id? For containers quota >>>>>>>>> only upper/work should be on the same project id. lowerdir should belong >>>>>>>>> to a shared project or no project at all. >>>>>>>>> In current docker implementation, every overlay2 driver root dir >>>>>>>>> is assigned a different project id, but lowerdir are symlinks to another >>>>>>>>> image root dir. >>>>>>>> >>>>>>>> For the setting of docker, you are completely right. My description of tesing >>>>>>>> environment in previous email is only for simple kernel testing and explaining >>>>>>>> the condition of testing result above, not specific to docker. >>>>>>>> >>>>>>>>> >>>>>>>>>> In case 1: >>>>>>>>>> There is no hardlimit/softlimit, so the expected result as below. >>>>>>>>>> Used: The used count in project quota which set to the directory >>>>>>>>>> include upperdir && workdir. >>>>>>>>>> Avail: The avail of underlying fs >>>>>>>>>> Total: Used + Avail >>>>>>>>>> >>>>>>>>>> #upperdir used 201M and /mnt/test3 used 2.5G >>>>>>>>>> >>>>>>>>>> In case 2: >>>>>>>>>> Project quota hardlimits are block count = 1G, inode count = 1000. >>>>>>>>>> So the expected result as below. >>>>>>>>>> >>>>>>>>>> Used: The used count in project quota which set to the directory >>>>>>>>>> include upperdir && workdir. >>>>>>>>>> Avail: (a)Hardlimit - Used or (b)The avail of underlying fs(when a > b) >>>>>>>>>> Total: Used + Avail >>>>>>>>>> >>>>>>>>> >>>>>>>>> Are those "expected" results inline with what df shows on project quota >>>>>>>>> directories without overlayfs? If not, and the new behavior makes sense, >>>>>>>>> why change overlayfs and not change 'df'? >>>>>>>> >>>>>>>> No, those results are based on my change of overlayfs. At the first, >>>>>>>> I didn't notice that underlying filesystems had already implemented >>>>>>>> 'quota-df' function. So I plan to do something in overlayfs because >>>>>>>> it's better than doing same things in every low level filesystems. >>>>>>>> But now, I'm more willing to persuade xfs/ext4 people to midify the >>>>>>>> detail mechanism of 'quota-df' in specific filesystems. >>>>>>>> >>>>>>>> 'df', hmm... Both solutions could work I think. >>>>>>>> >>>>>>> >>>>>>> I don't thing it is the underlying filesystem that implements quota-df >>>>>>> I think it is 'df' itself. I was also surprised to learn than, but at least when >>>>>>> you set project quota with xfs_quota via /etc/projects /etc/projid >>>>>>> df just shows you the project directories as if they were mount points. >>>>>>> I tested with xfs on Ubuntu, but suppose with ext4 and other distro it >>>>>>> is no different. >>>>>> >>>>>> That sounds really interesting and that must be special version of df command. >>>>>> On CentOS I have never seen that. :-( >>>>> >>>>> I donno, maybe I dreamed of seeing it... most likely I am confusing seeing >>>>> the correct df usage on overlayfs mount with upper that has project quota. >>>>> >>>>>> In any case if you take a quick look at below functions in the code, you will probably >>>>>> believe what I said before. If you stop calling those functions in the kernel code, >>>>>> then I guess all magic will be gone and never turn back again. :-) >>>>>> >>>>>> xfs: >>>>>> xfs_qm_statvfs >>>>>> >>>>>> ext4: >>>>>> ext4_statfs_project >>>>>> >>>>>> f2fs: >>>>>> f2fs_statfs_project >>>>> =【--07 >>>>>> >>>>> >>>>> I see. so I'll wait for your RFC patch to see what ovl_statfs_project brings >>>>> to the table. >>>> >>>> Seems there is nothing more to do unless we add more features like 'reservation' we discussed above. >>>> In this case I think we should consider adding 'reservation amount' to bfree, and bavail represents >>>> the real free space amount that can be utilized by new files. >>> >>> >>> Most of time underlying filesystem’s quota-df works well, but when real filesystem’s avail is lower than >>> project quota’s avail then the result is quite confusing. I’ve only tested on xfs but I think ext4 is >>> similar because they have same quota-df logic. >>> >>> For example, if we have 100GB xfs filesystem(/mnt/test2) and we have >>> 3 directories(pq1, pq2, pq3) inside it, each directory sets project quota. >>> (block hard limit up to 10GB) >>> >>> When avail space of real filesystem is only left 3.2MB, but when running df for >>> pg1,pg2,pg3 then avail space is 9.5GB, this is much more than real filesystem. >>> >>> >>> Detail output: >>> >>> $ df -h /mnt/test2 >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/vdb2 100G 100G 3.2M 100% /mnt/test2 >>> >>> $ df -h /mnt/test2/pq1 >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/vdb2 10G 570M 9.5G 6% /mnt/test2 >>> >>> $ df -h /mnt/test2/pq2 >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/vdb2 10G 570M 9.5G 6% /mnt/test2 >>> >>> $ df -h /mnt/test2/pq3 >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/vdb2 10G 570M 9.5G 6% /mnt/test2 >>> >>> >>> So I just think if we can adjust size/used/avail in overlayfs layer like below, >>> it maybe a little bit more helpful for our users. What do you think for this? >>> >>> >>> $ df -h /mnt/test2 >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/vdb2 100G 100G 3.2M 100% /mnt/test2 >>> >>> $ df -h /mnt/test2/pq1 >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/vdb2 574M 570M 3.2M 100% /mnt/test2 >>> >>> $ df -h /mnt/test2/pq2 >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/vdb2 574M 570M 3.2M 100% /mnt/test2 >>> >>> $ df -h /mnt/test2/pq3 >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/vdb2 574M 570M 3.2M 100% /mnt/test2 >>> >>> > > OOPS sent by mistake. > > Chengguang, > > People on this list may have not been following the similar thread on xfs list > and will wonder why the heck total size 574M > > You did not explain that is is a limitation of the current statfs() API and how > df presents the information. Sorry for lack of detail information. Because statfs(2) only collects size/avail information for df and calculate used by size - avail. So here we have to adjust flowing size to meet the df mechanism. > > My opinion is that at least for overlayfs use case, it is better to change 'df' > then to work around its current limitation and present another missleading > type of information. > > It was the same story with btrfs. 'df' was just to simple to meet the demands > of reporting to user the real status of btrfs disk space, so btrfs tools needed > to provide a better 'df'. > > IMO, the best course of action is to integrate 'df' with quota tools > information, > so that 'df' has more information and can display more accurate data. > > Overlayfs users are used to work in a container environment where not all > utilities work out of the box and may need to use alternative tools or hacks > inside the container, so it won't be a bug hurdle, even if this needs a special > flavor of 'df’. All right, I get your point and I agree with your proposal. Only one thing, I really hope put real avail info into quota-tool instead of df. :) Thanks, Chengguang. -- To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html