On Thu, Oct 29, 2020 at 8:05 PM Waiman Long <longman@xxxxxxxxxx> wrote: > > On 10/29/20 1:27 PM, Amir Goldstein wrote: > > On Thu, Oct 29, 2020 at 5:46 PM Waiman Long <longman@xxxxxxxxxx> wrote: > >> The default value of inotify.max_user_watches sysctl parameter was set > >> to 8192 since the introduction of the inotify feature in 2005 by > >> commit 0eeca28300df ("[PATCH] inotify"). Today this value is just too > >> small for many modern usage. As a result, users have to explicitly set > >> it to a larger value to make it work. > >> > >> After some searching around the web, these are the > >> inotify.max_user_watches values used by some projects: > >> - vscode: 524288 > >> - dropbox support: 100000 > >> - users on stackexchange: 12228 > >> - lsyncd user: 2000000 > >> - code42 support: 1048576 > >> - monodevelop: 16384 > >> - tectonic: 524288 > >> - openshift origin: 65536 > >> > >> Each watch point adds an inotify_inode_mark structure to an inode to > >> be watched. It also pins the watched inode as well as an inotify fdinfo > >> procfs file. > >> > >> Modeled after the epoll.max_user_watches behavior to adjust the default > >> value according to the amount of addressable memory available, make > >> inotify.max_user_watches behave in a similar way to make it use no more > >> than 1% of addressable memory within the range [8192, 1048576]. > >> > >> For 64-bit archs, inotify_inode_mark plus 2 inode have a size close > >> to 2 kbytes. That means a system with 196GB or more memory should have > >> the maximum value of 1048576 for inotify.max_user_watches. This default > >> should be big enough for most use cases. > >> > >> With my x86-64 config, the size of xfs_inode, proc_inode and > >> inotify_inode_mark is 1680 bytes. The estimated INOTIFY_WATCH_COST is > >> 1760 bytes. > >> > >> [v2: increase inotify watch cost as suggested by Amir and Honza] > >> > >> Signed-off-by: Waiman Long <longman@xxxxxxxxxx> > >> --- > >> fs/notify/inotify/inotify_user.c | 24 +++++++++++++++++++++++- > >> 1 file changed, 23 insertions(+), 1 deletion(-) > >> > >> diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c > >> index 186722ba3894..37d9f09c226f 100644 > >> --- a/fs/notify/inotify/inotify_user.c > >> +++ b/fs/notify/inotify/inotify_user.c > >> @@ -37,6 +37,16 @@ > >> > >> #include <asm/ioctls.h> > >> > >> +/* > >> + * An inotify watch requires allocating an inotify_inode_mark structure as > >> + * well as pinning the watched inode and adding inotify fdinfo procfs file. > > Maybe you misunderstood me. > > There is no procfs file per watch. > > There is a procfs file per inotify_init() fd. > > The fdinfo of that procfile lists all the watches of that inotify instance. > Thanks for the clarification. Yes, I probably had misunderstood you > because of the 2 * sizeof(inode) figure you provided. > >> + * The increase in size of a filesystem inode versus a VFS inode varies > >> + * depending on the filesystem. An extra 512 bytes is added as rough > >> + * estimate of the additional filesystem inode cost. > >> + */ > >> +#define INOTIFY_WATCH_COST (sizeof(struct inotify_inode_mark) + \ > >> + 2 * sizeof(struct inode) + 512) > >> + > > I would consider going with double the sizeof inode as rough approximation for > > filesystem inode size. > > > > It is a bit less arbitrary than 512 and it has some rationale behind it - > > Some kernel config options will grow struct inode (debug, smp) > > The same config options may also grow the filesystem part of the inode. > > > > And this approximation can be pretty accurate at times. > > For example, on Ubuntu 18.04 kernel 5.4.0: > > inode_cache 608 > > nfs_inode_cache 1088 > > btrfs_inode 1168 > > xfs_inode 1024 > > ext4_inode_cache 1096 > > Just to clarify, is your original 2 * sizeof(struct inode) figure > include the filesystem inode overhead or there is an additional inode > somewhere that I needs to go to 4 * sizeof(struct inode)? No additional inode. #define INOTIFY_WATCH_COST (sizeof(struct inotify_inode_mark) + \ 2 * sizeof(struct inode)) Not sure if the inotify_inode_mark part matters, but it doesn't hurt. Do note that Jan had a different proposal for fs inode size estimation (1K). I have no objection to this estimation if Jan insists. Thanks, Amir.