Re: "quota-df" feature

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



在 2018年4月1日,下午3:00,Amir Goldstein <amir73il@xxxxxxxxx> 写道:
> 
> On Sun, Apr 1, 2018 at 9:53 AM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>> On Sun, Apr 1, 2018 at 9:35 AM,  <cgxu519@xxxxxxx> wrote:
>>> 在 2018年3月10日,下午2:37,Chengguang Xu <cgxu519@xxxxxxx> 写道:
>>>> 
>>>>> Sent: Thursday, March 08, 2018 at 10:36 PM
>>>>> From: "Amir Goldstein" <amir73il@xxxxxxxxx>
>>>>> To: "Chengguang Xu" <cgxu519@xxxxxxx>
>>>>> Cc: overlayfs <linux-unionfs@xxxxxxxxxxxxxxx>, "Miklos Szeredi" <miklos@xxxxxxxxxx>
>>>>> Subject: Re: "quota-df" feature
>>>>> 
>>>>> On Thu, Mar 8, 2018 at 3:10 PM, Chengguang Xu <cgxu519@xxxxxxx> wrote:
>>>>> [...]
>>>>>>>>>> Lowerdirs could be share with different overlayfs, so I'm not sure counting
>>>>>>>>>> the contents of lowerdirs is proper behavior or not. If lowerdirs are dedicated
>>>>>>>>>> to one overlayfs then maybe setting same project id with upperdir can resolve
>>>>>>>>>> "covered" problem that you mentioned above. Counting usage information without
>>>>>>>>>> quota might be a hard work when having plenty of files.
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Well, its a different use case, but a very well known use case -
>>>>>>>>> When dealing with cloned files (e.g. btrfs, xfs) many files can share the same
>>>>>>>>> blocks, but each clone is fully accounted to the user/group/project.
>>>>>>>>> This is the "thin provisioning" use case - every user gets accounted by files
>>>>>>>>> that the user can reference, but the host does not pay the cost of sum of all
>>>>>>>>> user quotas.
>>>>>>>>> 
>>>>>>>>> Without accounting of "covered" files to begin with, is it possible to
>>>>>>>>> get to a state
>>>>>>>>> where 'touch' on a big file gets ENOSPC/EQUOTA. This is indeed a situation that
>>>>>>>>> can happen in "thin provisioned" filesystems (e.g. btrfs) or on thin
>>>>>>>>> provisioned block,
>>>>>>>>> but a situation that filesystems and administrators try really hard to avoid.
>>>>>>>> 
>>>>>>>> Let me make clear about the term of "covered", is it meaning hidden file in lowerdir
>>>>>>>> because of same named file exists in upperdir? or is it meaning the contents in
>>>>>>>> lowerdir but in the merged dir of overlayfs?
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> The former. "covered" file size does not show up in du -s, so the
>>>>>>> merged disk usage
>>>>>>> is <upper used> + <lower used> - <covered used>
>>>>>> 
>>>>>> Thanks, got it.
>>>>>> But IIUC, "covered" file does not have chance to copy-up, so
>>>>>> I'm wondering is it the real reason for getting ENOSPC/EQUOTA error?
>>>>> 
>>>>> Suppose your "image" (i.e. total disk usage of lower) is 1GB
>>>>> and you want to allow user to touch all the files in the image and
>>>>> create 1GB of new files.
>>>>> 
>>>>> If your only tool is project quota on upper then you need to set project
>>>>> quota hard limit to 2GB, but then user can create 2GB of new files and
>>>>> later when touching a lower file, will get EQUOTA on copy up.
>>>>> 
>>>>> If you account lower uncovered files to overlay merged quota then
>>>>> you set the merged quota to 2GB and start with 50% used.
>>>>> - Copy up will not change used
>>>>> - Remove of lower will reduce used
>>>>> - You can never get EQUOTA from touching a file
>>>> 
>>>> I get your point, I'd like to think it as "reservation" feature we can implement
>>>> in overlay for uncovered files, so that we can get rid of EQUOTA error during copy-up.
>>>> Even might get rid of ENOSPC error during copy-up when underlying filesystem supports
>>>> block reservation? Are there many people hope to have this kind of function?
>>>> 
>>>> 
>>>>>>>>> All I am saying is that it is "not hard" (TM) to keep track of
>>>>>>>>> "covered" files disk usage
>>>>>>>>> and "not hard" to re-calculate "covered" files disk usage when full
>>>>>>>>> indexing is enabled.
>>>>>>>>>> I don't know exactly what will happen when combining index and nfs_export options,
>>>>>>>>>> I need to read and understand related code later.
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> nfs_export REQUIRES index and implies indexing of all files on copy up, not
>>>>>>>>> only lower hardlinks.
>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Here are some test examples share with you (based on ext4):
>>>>>>>>>>>> 
>>>>>>>>>>>> 1) project quota enabled && without hard-limit
>>>>>>>>>>>> 
>>>>>>>>>>>> $ df -h /mnt/test3 /mnt/test3/df/merged
>>>>>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>>>>>>>> /dev/vdb3        99G  2.5G   91G   3% /mnt/test3
>>>>>>>>>>>> overlay          92G  201M   91G   1% /mnt/test3/df/merged
>>>>>>>>>>>> 
>>>>>>>>>>>> $ df -hi /mnt/test3 /mnt/test3/df/merged
>>>>>>>>>>>> Filesystem     Inodes IUsed IFree IUse% Mounted on
>>>>>>>>>>>> /dev/vdb3        6.3M  2.4K  6.3M    1% /mnt/test3
>>>>>>>>>>>> overlay          6.3M     8  6.3M    1% /mnt/test3/df/merged
>>>>>>>>>>>> 
>>>>>>>>>>>> 2) project quota enabled && with hard-limit
>>>>>>>>>>>> 
>>>>>>>>>>>> $ df -h /mnt/test3 /mnt/test3/df/merged
>>>>>>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>>>>>>>> /dev/vdb3        99G  2.5G   91G   3% /mnt/test3
>>>>>>>>>>>> overlay         1.0G  201M  824M  20% /mnt/test3/df/merged
>>>>>>>>>>>> 
>>>>>>>>>>>> $ df -hi /mnt/test3 /mnt/test3/df/merged
>>>>>>>>>>>> Filesystem     Inodes IUsed IFree IUse% Mounted on
>>>>>>>>>>>> /dev/vdb3        6.3M  2.4K  6.3M    1% /mnt/test3
>>>>>>>>>>>> overlay          1000     8   992    1% /mnt/test3/df/merged
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I can't follow from the example below what is the expected result and why
>>>>>>>>>>> if you add the quota setup commands that could be useful.
>>>>>>>>>> 
>>>>>>>>>> Underlying fs using ext4 and mkfs/mount with project quota option and
>>>>>>>>>> all needed directories(lowerdir, upperdir, workdir, merged) are set same
>>>>>>>>>> project quota. Current quota-df implementation only adjust the couting
>>>>>>>>>> information when having upperdir.
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Really? why is lowerdir on the same project id? For containers quota
>>>>>>>>> only upper/work should be on the same project id. lowerdir should belong
>>>>>>>>> to a shared project or no project at all.
>>>>>>>>> In current docker implementation, every overlay2 driver root dir
>>>>>>>>> is assigned a different project id, but lowerdir are symlinks to another
>>>>>>>>> image root dir.
>>>>>>>> 
>>>>>>>> For the setting of docker, you are completely right. My description of tesing
>>>>>>>> environment in previous email is only for simple kernel testing and explaining
>>>>>>>> the condition of testing result above, not specific to docker.
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> In case 1:
>>>>>>>>>> There is no hardlimit/softlimit, so the expected result as below.
>>>>>>>>>> Used:  The used count in project quota which set to the directory
>>>>>>>>>>      include upperdir && workdir.
>>>>>>>>>> Avail: The avail of underlying fs
>>>>>>>>>> Total: Used + Avail
>>>>>>>>>> 
>>>>>>>>>> #upperdir used 201M and /mnt/test3 used 2.5G
>>>>>>>>>> 
>>>>>>>>>> In case 2:
>>>>>>>>>> Project quota hardlimits are block count = 1G, inode count = 1000.
>>>>>>>>>> So the expected result as below.
>>>>>>>>>> 
>>>>>>>>>> Used:  The used count in project quota which set to the directory
>>>>>>>>>>      include upperdir && workdir.
>>>>>>>>>> Avail: (a)Hardlimit - Used or (b)The avail of underlying fs(when a > b)
>>>>>>>>>> Total: Used + Avail
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Are those "expected" results inline with what df shows on project quota
>>>>>>>>> directories without overlayfs? If not, and the new behavior makes sense,
>>>>>>>>> why change overlayfs and not change 'df'?
>>>>>>>> 
>>>>>>>> No, those results are based on my change of overlayfs. At the first,
>>>>>>>> I didn't notice that underlying filesystems had already implemented
>>>>>>>> 'quota-df' function. So I plan to do something in overlayfs because
>>>>>>>> it's better than doing same things in every low level filesystems.
>>>>>>>> But now, I'm more willing to persuade xfs/ext4 people to midify the
>>>>>>>> detail mechanism of 'quota-df' in specific filesystems.
>>>>>>>> 
>>>>>>>> 'df', hmm... Both solutions could work I think.
>>>>>>>> 
>>>>>>> 
>>>>>>> I don't thing it is the underlying filesystem that implements quota-df
>>>>>>> I think it is 'df' itself. I was also surprised to learn than, but at least when
>>>>>>> you set project quota with xfs_quota via /etc/projects /etc/projid
>>>>>>> df just shows you the project directories as if they were mount points.
>>>>>>> I tested with xfs on Ubuntu, but suppose with ext4 and other distro it
>>>>>>> is no different.
>>>>>> 
>>>>>> That sounds really interesting and that must be special version of df command.
>>>>>> On CentOS I have never seen that. :-(
>>>>> 
>>>>> I donno, maybe I dreamed of seeing it... most likely I am confusing seeing
>>>>> the correct df usage on overlayfs mount with upper that has project quota.
>>>>> 
>>>>>> In any case if you take a quick look at below functions in the code, you will probably
>>>>>> believe what I said before. If you stop calling those functions in the kernel code,
>>>>>> then I guess all magic will be gone and never turn back again. :-)
>>>>>> 
>>>>>> xfs:
>>>>>> xfs_qm_statvfs
>>>>>> 
>>>>>> ext4:
>>>>>> ext4_statfs_project
>>>>>> 
>>>>>> f2fs:
>>>>>> f2fs_statfs_project
>>>>> =【--07
>>>>>> 
>>>>> 
>>>>> I see. so I'll wait for your RFC patch to see what ovl_statfs_project brings
>>>>> to the table.
>>>> 
>>>> Seems there is nothing more to do unless we add more features like 'reservation' we discussed above.
>>>> In this case I think we should consider adding 'reservation amount' to bfree, and bavail represents
>>>> the real free space amount that can be utilized by new files.
>>> 
>>> 
>>> Most of time underlying filesystem’s quota-df works well, but when real filesystem’s avail is lower than
>>> project quota’s avail then the result is quite confusing. I’ve only tested on xfs but I think ext4 is
>>> similar because they have same quota-df logic.
>>> 
>>> For example, if we have 100GB xfs filesystem(/mnt/test2) and we have
>>> 3 directories(pq1, pq2, pq3) inside it, each directory sets project quota.
>>> (block hard limit up to 10GB)
>>> 
>>> When avail space of real filesystem is only left 3.2MB, but when running df for
>>> pg1,pg2,pg3 then avail space is 9.5GB, this is much more than real filesystem.
>>> 
>>> 
>>> Detail output:
>>> 
>>> $ df -h /mnt/test2
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> /dev/vdb2       100G  100G  3.2M 100% /mnt/test2
>>> 
>>> $ df -h /mnt/test2/pq1
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> /dev/vdb2        10G  570M  9.5G   6% /mnt/test2
>>> 
>>> $ df -h /mnt/test2/pq2
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> /dev/vdb2        10G  570M  9.5G   6% /mnt/test2
>>> 
>>> $ df -h /mnt/test2/pq3
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> /dev/vdb2        10G  570M  9.5G   6% /mnt/test2
>>> 
>>> 
>>> So I just think if we can adjust size/used/avail in overlayfs layer like below,
>>> it maybe a little bit more helpful for our users. What do you think for this?
>>> 
>>> 
>>> $ df -h /mnt/test2
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> /dev/vdb2       100G  100G  3.2M 100% /mnt/test2
>>> 
>>> $ df -h /mnt/test2/pq1
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> /dev/vdb2       574M  570M  3.2M 100% /mnt/test2
>>> 
>>> $ df -h /mnt/test2/pq2
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> /dev/vdb2       574M  570M  3.2M 100% /mnt/test2
>>> 
>>> $ df -h /mnt/test2/pq3
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> /dev/vdb2       574M  570M  3.2M 100% /mnt/test2
>>> 
>>> 
> 
> OOPS sent by mistake.
> 
> Chengguang,
> 
> People on this list may have not been following the similar thread on xfs list
> and will wonder why the heck total size 574M
> 
> You did not explain that is is a limitation of the current statfs() API and how
> df presents the information.

Sorry for lack of detail information. Because statfs(2) only collects size/avail
information for df and calculate used by size - avail. So here we have to adjust
flowing size to meet the df mechanism.


> 
> My opinion is that at least for overlayfs use case, it is better to change 'df'
> then to work around its current limitation and present another missleading
> type of information.
> 
> It was the same story with btrfs. 'df' was just to simple to meet the demands
> of reporting to user the real status of btrfs disk space, so btrfs tools needed
> to provide a better 'df'.
> 
> IMO, the best course of action is to integrate 'df' with quota tools
> information,
> so that 'df' has more information and can display more accurate data.
> 
> Overlayfs users are used to work in a container environment where not all
> utilities work out of the box and may need to use alternative tools or hacks
> inside the container, so it won't be a bug hurdle, even if this needs a special
> flavor of 'df’.

All right, I get your point and I agree with your proposal.
Only one thing, I really hope put real avail info into quota-tool instead of df. :)

Thanks,
Chengguang.


--
To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux