On Sun, Mar 29, 2020 at 5:19 PM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote: > > ---- 在 星期五, 2020-03-27 17:45:37 Amir Goldstein <amir73il@xxxxxxxxx> 撰写 ---- > > On Fri, Mar 27, 2020 at 8:18 AM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote: > > > > > > ---- 在 星期四, 2020-03-26 15:34:13 Amir Goldstein <amir73il@xxxxxxxxx> 撰写 ---- > > > > On Thu, Mar 26, 2020 at 7:45 AM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote: > > > > > > > > > > Hello, > > > > > > > > > > On container use case, in order to prevent inode exhaustion on host file system by particular containers, we would like to add inode limitation for containers. > > > > > However, current solution for inode limitation is based on project quota in specific underlying filesystem so it will also count deleted files(char type files) in overlay's upper layer. > > > > > Even worse, users may delete some lower layer files for getting more usable free inodes but the result will be opposite (consuming more inodes). > > > > > > > > > > It is somewhat different compare to disk size limitation for overlayfs, so I think maybe we can add a limit option just for new files in overlayfs. What do you think? > > > > You are saying above that the goal is to prevent inode exhaustion on > > host file system, > > but you want to allow containers to modify and delete unlimited number > > of lower files > > thus allowing inode exhaustion. I don't see the logic is that. > > > > End users do not understand kernel tech very well, so we just want to mitigate > container's different user experience as much as possible. In our point of view, > consuming more inode by deleting lower file is the feature of overlayfs, it's not > caused by user's abnormal using. However, we have to limit malicious user > program which is endlessly creating new files until host inode exhausting. > > > > Even if we only count new files and present this information on df -i > > how would users be able to free up inodes when they hit the limit? > > How would they know which inodes to delete? > > > > > > > > > > The questions are where do we store the accounting and how do we maintain them. > > > > An answer to those questions could be - in the inode index: > > > > > > > > Currently, with nfs_export=on, there is already an index dir containing: > > > > - 1 hardlink per copied up non-dir inode > > > > - 1 directory per copied-up directory > > > > - 1 whiteout per whiteout in upperdir (not an hardlink) > > > > > > > > > > Hi Amir, > > > > > > Thanks for quick response and detail information. > > > > > > I think the simplest way is just store accounting info in memory(maybe in s_fs_info). > > > At very first, I just thought doing it for container use case, for container, it will be > > > enough because the upper layer is always empty at starting time and will be destroyed > > > at ending time. > > > > That is not a concept that overlayfs is currently aware of. > > *If* the concept is acceptable and you do implement a feature intended for this > > special use case, you should verify on mount time that upperdir is empty. > > > > > > > > Adding a meta info to index dir is a better solution for general use case but it seems > > > more complicated and I'm not sure if there are other use cases concern with this problem. > > > Suggestion? > > > > docker already supports container storage quota using project quotas > > on upperdir (I implemented it). > > Seems like a very natural extension to also limit no. of inodes. > > The problem, as you wrote it above is that project quotas > > "will also count deleted files(char type files) in overlay's upper layer." > > My suggestion to you was a way to account for the whiteouts separately, > > so you may deduct them from total inode count. > > If you are saying my suggestion is complicated, perhaps you did not > > understand it. > > > > I think the key point here is the count of whiteout inode. I would like to > propose share same inode with different whiteout files so that we can save > inode significantly for whiteout files. After this, I think we can just implement > normal inode limit for container just like block limit. > Very good idea. See: https://lore.kernel.org/linux-unionfs/20180301064526.17216-1-houtao1@xxxxxxxxxx/ I don't think Tao ever followed up with v3 patch. Thanks, Amir.