On Sun, Apr 1, 2018 at 10:49 AM, <cgxu519@xxxxxxx> wrote: > 在 2018年4月1日,下午3:00,Amir Goldstein <amir73il@xxxxxxxxx> 写道: >> >> On Sun, Apr 1, 2018 at 9:53 AM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: >>> On Sun, Apr 1, 2018 at 9:35 AM, <cgxu519@xxxxxxx> wrote: >>>> 在 2018年3月10日,下午2:37,Chengguang Xu <cgxu519@xxxxxxx> 写道: >>>>> >>>>>> Sent: Thursday, March 08, 2018 at 10:36 PM >>>>>> From: "Amir Goldstein" <amir73il@xxxxxxxxx> >>>>>> To: "Chengguang Xu" <cgxu519@xxxxxxx> >>>>>> Cc: overlayfs <linux-unionfs@xxxxxxxxxxxxxxx>, "Miklos Szeredi" <miklos@xxxxxxxxxx> >>>>>> Subject: Re: "quota-df" feature >>>>>> >>>>>> On Thu, Mar 8, 2018 at 3:10 PM, Chengguang Xu <cgxu519@xxxxxxx> wrote: >>>>>> [...] >>>>>>>>>>> Lowerdirs could be share with different overlayfs, so I'm not sure counting >>>>>>>>>>> the contents of lowerdirs is proper behavior or not. If lowerdirs are dedicated >>>>>>>>>>> to one overlayfs then maybe setting same project id with upperdir can resolve >>>>>>>>>>> "covered" problem that you mentioned above. Counting usage information without >>>>>>>>>>> quota might be a hard work when having plenty of files. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Well, its a different use case, but a very well known use case - >>>>>>>>>> When dealing with cloned files (e.g. btrfs, xfs) many files can share the same >>>>>>>>>> blocks, but each clone is fully accounted to the user/group/project. >>>>>>>>>> This is the "thin provisioning" use case - every user gets accounted by files >>>>>>>>>> that the user can reference, but the host does not pay the cost of sum of all >>>>>>>>>> user quotas. >>>>>>>>>> >>>>>>>>>> Without accounting of "covered" files to begin with, is it possible to >>>>>>>>>> get to a state >>>>>>>>>> where 'touch' on a big file gets ENOSPC/EQUOTA. This is indeed a situation that >>>>>>>>>> can happen in "thin provisioned" filesystems (e.g. btrfs) or on thin >>>>>>>>>> provisioned block, >>>>>>>>>> but a situation that filesystems and administrators try really hard to avoid. >>>>>>>>> >>>>>>>>> Let me make clear about the term of "covered", is it meaning hidden file in lowerdir >>>>>>>>> because of same named file exists in upperdir? or is it meaning the contents in >>>>>>>>> lowerdir but in the merged dir of overlayfs? >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> The former. "covered" file size does not show up in du -s, so the >>>>>>>> merged disk usage >>>>>>>> is <upper used> + <lower used> - <covered used> >>>>>>> >>>>>>> Thanks, got it. >>>>>>> But IIUC, "covered" file does not have chance to copy-up, so >>>>>>> I'm wondering is it the real reason for getting ENOSPC/EQUOTA error? >>>>>> >>>>>> Suppose your "image" (i.e. total disk usage of lower) is 1GB >>>>>> and you want to allow user to touch all the files in the image and >>>>>> create 1GB of new files. >>>>>> >>>>>> If your only tool is project quota on upper then you need to set project >>>>>> quota hard limit to 2GB, but then user can create 2GB of new files and >>>>>> later when touching a lower file, will get EQUOTA on copy up. >>>>>> >>>>>> If you account lower uncovered files to overlay merged quota then >>>>>> you set the merged quota to 2GB and start with 50% used. >>>>>> - Copy up will not change used >>>>>> - Remove of lower will reduce used >>>>>> - You can never get EQUOTA from touching a file >>>>> >>>>> I get your point, I'd like to think it as "reservation" feature we can implement >>>>> in overlay for uncovered files, so that we can get rid of EQUOTA error during copy-up. >>>>> Even might get rid of ENOSPC error during copy-up when underlying filesystem supports >>>>> block reservation? Are there many people hope to have this kind of function? >>>>> >>>>> >>>>>>>>>> All I am saying is that it is "not hard" (TM) to keep track of >>>>>>>>>> "covered" files disk usage >>>>>>>>>> and "not hard" to re-calculate "covered" files disk usage when full >>>>>>>>>> indexing is enabled. >>>>>>>>>>> I don't know exactly what will happen when combining index and nfs_export options, >>>>>>>>>>> I need to read and understand related code later. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> nfs_export REQUIRES index and implies indexing of all files on copy up, not >>>>>>>>>> only lower hardlinks. >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Here are some test examples share with you (based on ext4): >>>>>>>>>>>>> >>>>>>>>>>>>> 1) project quota enabled && without hard-limit >>>>>>>>>>>>> >>>>>>>>>>>>> $ df -h /mnt/test3 /mnt/test3/df/merged >>>>>>>>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>>>>>>>> /dev/vdb3 99G 2.5G 91G 3% /mnt/test3 >>>>>>>>>>>>> overlay 92G 201M 91G 1% /mnt/test3/df/merged >>>>>>>>>>>>> >>>>>>>>>>>>> $ df -hi /mnt/test3 /mnt/test3/df/merged >>>>>>>>>>>>> Filesystem Inodes IUsed IFree IUse% Mounted on >>>>>>>>>>>>> /dev/vdb3 6.3M 2.4K 6.3M 1% /mnt/test3 >>>>>>>>>>>>> overlay 6.3M 8 6.3M 1% /mnt/test3/df/merged >>>>>>>>>>>>> >>>>>>>>>>>>> 2) project quota enabled && with hard-limit >>>>>>>>>>>>> >>>>>>>>>>>>> $ df -h /mnt/test3 /mnt/test3/df/merged >>>>>>>>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>>>>>>>> /dev/vdb3 99G 2.5G 91G 3% /mnt/test3 >>>>>>>>>>>>> overlay 1.0G 201M 824M 20% /mnt/test3/df/merged >>>>>>>>>>>>> >>>>>>>>>>>>> $ df -hi /mnt/test3 /mnt/test3/df/merged >>>>>>>>>>>>> Filesystem Inodes IUsed IFree IUse% Mounted on >>>>>>>>>>>>> /dev/vdb3 6.3M 2.4K 6.3M 1% /mnt/test3 >>>>>>>>>>>>> overlay 1000 8 992 1% /mnt/test3/df/merged >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I can't follow from the example below what is the expected result and why >>>>>>>>>>>> if you add the quota setup commands that could be useful. >>>>>>>>>>> >>>>>>>>>>> Underlying fs using ext4 and mkfs/mount with project quota option and >>>>>>>>>>> all needed directories(lowerdir, upperdir, workdir, merged) are set same >>>>>>>>>>> project quota. Current quota-df implementation only adjust the couting >>>>>>>>>>> information when having upperdir. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Really? why is lowerdir on the same project id? For containers quota >>>>>>>>>> only upper/work should be on the same project id. lowerdir should belong >>>>>>>>>> to a shared project or no project at all. >>>>>>>>>> In current docker implementation, every overlay2 driver root dir >>>>>>>>>> is assigned a different project id, but lowerdir are symlinks to another >>>>>>>>>> image root dir. >>>>>>>>> >>>>>>>>> For the setting of docker, you are completely right. My description of tesing >>>>>>>>> environment in previous email is only for simple kernel testing and explaining >>>>>>>>> the condition of testing result above, not specific to docker. >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> In case 1: >>>>>>>>>>> There is no hardlimit/softlimit, so the expected result as below. >>>>>>>>>>> Used: The used count in project quota which set to the directory >>>>>>>>>>> include upperdir && workdir. >>>>>>>>>>> Avail: The avail of underlying fs >>>>>>>>>>> Total: Used + Avail >>>>>>>>>>> >>>>>>>>>>> #upperdir used 201M and /mnt/test3 used 2.5G >>>>>>>>>>> >>>>>>>>>>> In case 2: >>>>>>>>>>> Project quota hardlimits are block count = 1G, inode count = 1000. >>>>>>>>>>> So the expected result as below. >>>>>>>>>>> >>>>>>>>>>> Used: The used count in project quota which set to the directory >>>>>>>>>>> include upperdir && workdir. >>>>>>>>>>> Avail: (a)Hardlimit - Used or (b)The avail of underlying fs(when a > b) >>>>>>>>>>> Total: Used + Avail >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Are those "expected" results inline with what df shows on project quota >>>>>>>>>> directories without overlayfs? If not, and the new behavior makes sense, >>>>>>>>>> why change overlayfs and not change 'df'? >>>>>>>>> >>>>>>>>> No, those results are based on my change of overlayfs. At the first, >>>>>>>>> I didn't notice that underlying filesystems had already implemented >>>>>>>>> 'quota-df' function. So I plan to do something in overlayfs because >>>>>>>>> it's better than doing same things in every low level filesystems. >>>>>>>>> But now, I'm more willing to persuade xfs/ext4 people to midify the >>>>>>>>> detail mechanism of 'quota-df' in specific filesystems. >>>>>>>>> >>>>>>>>> 'df', hmm... Both solutions could work I think. >>>>>>>>> >>>>>>>> >>>>>>>> I don't thing it is the underlying filesystem that implements quota-df >>>>>>>> I think it is 'df' itself. I was also surprised to learn than, but at least when >>>>>>>> you set project quota with xfs_quota via /etc/projects /etc/projid >>>>>>>> df just shows you the project directories as if they were mount points. >>>>>>>> I tested with xfs on Ubuntu, but suppose with ext4 and other distro it >>>>>>>> is no different. >>>>>>> >>>>>>> That sounds really interesting and that must be special version of df command. >>>>>>> On CentOS I have never seen that. :-( >>>>>> >>>>>> I donno, maybe I dreamed of seeing it... most likely I am confusing seeing >>>>>> the correct df usage on overlayfs mount with upper that has project quota. >>>>>> >>>>>>> In any case if you take a quick look at below functions in the code, you will probably >>>>>>> believe what I said before. If you stop calling those functions in the kernel code, >>>>>>> then I guess all magic will be gone and never turn back again. :-) >>>>>>> >>>>>>> xfs: >>>>>>> xfs_qm_statvfs >>>>>>> >>>>>>> ext4: >>>>>>> ext4_statfs_project >>>>>>> >>>>>>> f2fs: >>>>>>> f2fs_statfs_project >>>>>> =【--07 >>>>>>> >>>>>> >>>>>> I see. so I'll wait for your RFC patch to see what ovl_statfs_project brings >>>>>> to the table. >>>>> >>>>> Seems there is nothing more to do unless we add more features like 'reservation' we discussed above. >>>>> In this case I think we should consider adding 'reservation amount' to bfree, and bavail represents >>>>> the real free space amount that can be utilized by new files. >>>> >>>> >>>> Most of time underlying filesystem’s quota-df works well, but when real filesystem’s avail is lower than >>>> project quota’s avail then the result is quite confusing. I’ve only tested on xfs but I think ext4 is >>>> similar because they have same quota-df logic. >>>> >>>> For example, if we have 100GB xfs filesystem(/mnt/test2) and we have >>>> 3 directories(pq1, pq2, pq3) inside it, each directory sets project quota. >>>> (block hard limit up to 10GB) >>>> >>>> When avail space of real filesystem is only left 3.2MB, but when running df for >>>> pg1,pg2,pg3 then avail space is 9.5GB, this is much more than real filesystem. >>>> >>>> >>>> Detail output: >>>> >>>> $ df -h /mnt/test2 >>>> Filesystem Size Used Avail Use% Mounted on >>>> /dev/vdb2 100G 100G 3.2M 100% /mnt/test2 >>>> >>>> $ df -h /mnt/test2/pq1 >>>> Filesystem Size Used Avail Use% Mounted on >>>> /dev/vdb2 10G 570M 9.5G 6% /mnt/test2 >>>> >>>> $ df -h /mnt/test2/pq2 >>>> Filesystem Size Used Avail Use% Mounted on >>>> /dev/vdb2 10G 570M 9.5G 6% /mnt/test2 >>>> >>>> $ df -h /mnt/test2/pq3 >>>> Filesystem Size Used Avail Use% Mounted on >>>> /dev/vdb2 10G 570M 9.5G 6% /mnt/test2 >>>> >>>> >>>> So I just think if we can adjust size/used/avail in overlayfs layer like below, >>>> it maybe a little bit more helpful for our users. What do you think for this? >>>> >>>> >>>> $ df -h /mnt/test2 >>>> Filesystem Size Used Avail Use% Mounted on >>>> /dev/vdb2 100G 100G 3.2M 100% /mnt/test2 >>>> >>>> $ df -h /mnt/test2/pq1 >>>> Filesystem Size Used Avail Use% Mounted on >>>> /dev/vdb2 574M 570M 3.2M 100% /mnt/test2 >>>> >>>> $ df -h /mnt/test2/pq2 >>>> Filesystem Size Used Avail Use% Mounted on >>>> /dev/vdb2 574M 570M 3.2M 100% /mnt/test2 >>>> >>>> $ df -h /mnt/test2/pq3 >>>> Filesystem Size Used Avail Use% Mounted on >>>> /dev/vdb2 574M 570M 3.2M 100% /mnt/test2 >>>> >>>> >> >> OOPS sent by mistake. >> >> Chengguang, >> >> People on this list may have not been following the similar thread on xfs list >> and will wonder why the heck total size 574M >> >> You did not explain that is is a limitation of the current statfs() API and how >> df presents the information. > > Sorry for lack of detail information. Because statfs(2) only collects size/avail > information for df and calculate used by size - avail. So here we have to adjust > flowing size to meet the df mechanism. > > >> >> My opinion is that at least for overlayfs use case, it is better to change 'df' >> then to work around its current limitation and present another missleading >> type of information. >> >> It was the same story with btrfs. 'df' was just to simple to meet the demands >> of reporting to user the real status of btrfs disk space, so btrfs tools needed >> to provide a better 'df'. >> >> IMO, the best course of action is to integrate 'df' with quota tools >> information, >> so that 'df' has more information and can display more accurate data. >> >> Overlayfs users are used to work in a container environment where not all >> utilities work out of the box and may need to use alternative tools or hacks >> inside the container, so it won't be a bug hurdle, even if this needs a special >> flavor of 'df’. > > All right, I get your point and I agree with your proposal. > Only one thing, I really hope put real avail info into quota-tool instead of df. :) > Of course. Makes sense. Thanks, Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html