Re: Inode limitation for overlayfs

Amir Goldstein <amir73il@xxxxxxxxx> · Sun, 29 Mar 2020 18:06:42 +0300

On Sun, Mar 29, 2020 at 5:19 PM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote:
>
>  ---- 在 星期五, 2020-03-27 17:45:37 Amir Goldstein <amir73il@xxxxxxxxx> 撰写 ----
>  > On Fri, Mar 27, 2020 at 8:18 AM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote:
>  > >
>  > >  ---- 在 星期四, 2020-03-26 15:34:13 Amir Goldstein <amir73il@xxxxxxxxx> 撰写 ----
>  > >  > On Thu, Mar 26, 2020 at 7:45 AM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote:
>  > >  > >
>  > >  > > Hello,
>  > >  > >
>  > >  > > On container use case, in order to prevent inode exhaustion on host file system by particular containers,  we would like to add inode limitation for containers.
>  > >  > > However,  current solution for inode limitation is based on project quota in specific underlying filesystem so it will also count deleted files(char type files) in overlay's upper layer.
>  > >  > > Even worse, users may delete some lower layer files for getting more usable free inodes but the result will be opposite (consuming more inodes).
>  > >  > >
>  > >  > > It is somewhat different compare to disk size limitation for overlayfs, so I think maybe we can add a limit option just for new files in overlayfs. What do you think?
>  >
>  > You are saying above that the goal is to prevent inode exhaustion on
>  > host file system,
>  > but you want to allow containers to modify and delete unlimited number
>  > of lower files
>  > thus allowing inode exhaustion. I don't see the logic is that.
>  >
>
> End users do not understand kernel tech very well, so we just want to mitigate
> container's different user experience as much as possible. In our point of view,
> consuming more inode by deleting lower file is the feature of overlayfs, it's not
> caused by user's  abnormal using. However, we have to limit malicious user
> program which is endlessly creating new files until host inode exhausting.
>
>
>  > Even if we only count new files and present this information on df -i
>  > how would users be able to free up inodes when they hit the limit?
>  > How would they know which inodes to delete?
>  >
>  > >  >
>  > >  > The questions are where do we store the accounting and how do we maintain them.
>  > >  > An answer to those questions could be - in the inode index:
>  > >  >
>  > >  > Currently, with nfs_export=on, there is already an index dir containing:
>  > >  > - 1 hardlink per copied up non-dir inode
>  > >  > - 1 directory per copied-up directory
>  > >  > - 1 whiteout per whiteout in upperdir (not an hardlink)
>  > >  >
>  > >
>  > > Hi Amir,
>  > >
>  > > Thanks for quick response and detail information.
>  > >
>  > > I think the simplest way is just store accounting info in memory(maybe  in s_fs_info).
>  > > At very first, I just thought  doing it for container use case, for container, it will be
>  > > enough because the upper layer is always empty at starting time and will be destroyed
>  > > at ending time.
>  >
>  > That is not a concept that overlayfs is currently aware of.
>  > *If* the concept is acceptable and you do implement a feature intended for this
>  > special use case, you should verify on mount time that upperdir is empty.
>  >
>  > >
>  > > Adding a meta info to index dir is a  better solution for general use case but it seems
>  > > more complicated and I'm not sure if there are other use cases concern with this problem.
>  > > Suggestion?
>  >
>  > docker already supports container storage quota using project quotas
>  > on upperdir (I implemented it).
>  > Seems like a very natural extension to also limit no. of inodes.
>  > The problem, as you wrote it above is that project quotas
>  > "will also count deleted files(char type files) in overlay's upper layer."
>  > My suggestion to you was a way to account for the whiteouts separately,
>  > so you may deduct them from total inode count.
>  > If you are saying my suggestion is complicated, perhaps you did not
>  > understand it.
>  >
>
> I think the key point here is the count of whiteout inode. I would like to
> propose share same inode with different whiteout files so that we can save
> inode significantly for whiteout files. After this, I think we can just implement
> normal inode limit for container just like block limit.
>

Very good idea. See:
https://lore.kernel.org/linux-unionfs/20180301064526.17216-1-houtao1@xxxxxxxxxx/

I don't think Tao ever followed up with v3 patch.

Thanks,
Amir.