Thanks Du, nice bit of info! It made me wander about the following: - Could it be then the default answer we give to "glusterfs client high memory usage" type of complaints to set vfs_cache_pressure to 100 + x? - And then x = ? Was there proper performance testing done to see how performance / mem consumtion changes in terms of vfs_cache_performace? - vfs_cache_pressure is an allover system tunable. If 100 + x is ideal for GlusterFS, can we take the courage to propose this? Is there no risk to trash other (disk-based) filesystems' performace? Csaba On Wed, Sep 6, 2017 at 6:57 AM, Raghavendra G <raghavendra@xxxxxxxxxxx> wrote: > Another parallel effort could be trying to configure the number of > inodes/dentries cached by kernel VFS using /proc/sys/vm interface. > > ============================================================== > > vfs_cache_pressure > ------------------ > > This percentage value controls the tendency of the kernel to reclaim > the memory which is used for caching of directory and inode objects. > > At the default value of vfs_cache_pressure=100 the kernel will attempt to > reclaim dentries and inodes at a "fair" rate with respect to pagecache and > swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to > prefer > to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel > will > never reclaim dentries and inodes due to memory pressure and this can easily > lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100 > causes the kernel to prefer to reclaim dentries and inodes. > > Increasing vfs_cache_pressure significantly beyond 100 may have negative > performance impact. Reclaim code needs to take various locks to find > freeable > directory and inode objects. With vfs_cache_pressure=1000, it will look for > ten times more freeable objects than there are. > > Also we've an article for sysadmins which has a section: > > <quote> > > With GlusterFS, many users with a lot of storage and many small files > easily end up using a lot of RAM on the server side due to > 'inode/dentry' caching, leading to decreased performance when the kernel > keeps crawling through data-structures on a 40GB RAM system. Changing > this value higher than 100 has helped many users to achieve fair caching > and more responsiveness from the kernel. > > </quote> > > Complete article can be found at: > https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Linux%20Kernel%20Tuning/ > > regards, > > > On Tue, Sep 5, 2017 at 5:20 PM, Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> > wrote: >> >> +gluster-devel >> >> Ashish just spoke to me about need of GC of inodes due to some state in >> inode that is being proposed in EC. Hence adding more people to >> conversation. >> >> > > On 4 September 2017 at 12:34, Csaba Henk <chenk@xxxxxxxxxx> wrote: >> > > >> > > > I don't know, depends on how sophisticated GC we need/want/can get >> > > > by. I >> > > > guess the complexity will be inherent, ie. that of the algorithm >> > > > chosen >> > > > and >> > > > how we address concurrency & performance impacts, but once that's >> > > > got >> > > > right >> > > > the other aspects of implementation won't be hard. >> > > > >> > > > Eg. would it be good just to maintain a simple LRU list? >> > > > >> > >> > Yes. I was also thinking of leveraging lru list. We can invalidate first >> > "n" >> > inodes from lru list of fuse inode table. >> > >> > > >> > > That might work for starters. >> > > >> > > > >> > > > Csaba >> > > > >> > > > On Mon, Sep 4, 2017 at 8:48 AM, Nithya Balachandran >> > > > <nbalacha@xxxxxxxxxx> >> > > > wrote: >> > > > >> > > >> >> > > >> >> > > >> On 4 September 2017 at 12:14, Csaba Henk <chenk@xxxxxxxxxx> wrote: >> > > >> >> > > >>> Basically how I see the fuse invalidate calls as rescuers of >> > > >>> sanity. >> > > >>> >> > > >>> Normally, when you have lot of certain kind of stuff that tends to >> > > >>> accumulate, the immediate thought is: let's set up some garbage >> > > >>> collection >> > > >>> mechanism, that will take care of keeping the accumulation at bay. >> > > >>> But >> > > >>> that's what doesn't work with inodes in a naive way, as they are >> > > >>> referenced >> > > >>> from kernel, so we have to keep them around until kernel tells us >> > > >>> it's >> > > >>> giving up its reference. However, with the fuse invalidate calls >> > > >>> we can >> > > >>> take the initiative and instruct the kernel: "hey, kernel, give up >> > > >>> your >> > > >>> references to this thing!" >> > > >>> >> > > >>> So we are actually free to implement any kind of inode GC in >> > > >>> glusterfs, >> > > >>> just have to take care to add the proper callback to >> > > >>> fuse_invalidate_* >> > > >>> and >> > > >>> we are good to go. >> > > >>> >> > > >>> >> > > >> That sounds good and something we need to do in the near future. Is >> > > >> this >> > > >> something that is easy to implement? >> > > >> >> > > >> >> > > >>> Csaba >> > > >>> >> > > >>> On Mon, Sep 4, 2017 at 7:00 AM, Nithya Balachandran >> > > >>> <nbalacha@xxxxxxxxxx >> > > >>> > wrote: >> > > >>> >> > > >>>> >> > > >>>> >> > > >>>> On 4 September 2017 at 10:25, Raghavendra Gowdappa >> > > >>>> <rgowdapp@xxxxxxxxxx >> > > >>>> > wrote: >> > > >>>> >> > > >>>>> >> > > >>>>> >> > > >>>>> ----- Original Message ----- >> > > >>>>> > From: "Nithya Balachandran" <nbalacha@xxxxxxxxxx> >> > > >>>>> > Sent: Monday, September 4, 2017 10:19:37 AM >> > > >>>>> > Subject: Fuse mounts and inodes >> > > >>>>> > >> > > >>>>> > Hi, >> > > >>>>> > >> > > >>>>> > One of the reasons for the memory consumption in gluster fuse >> > > >>>>> > mounts >> > > >>>>> is the >> > > >>>>> > number of inodes in the table which are never kicked out. >> > > >>>>> > >> > > >>>>> > Is there any way to default to an entry-timeout and >> > > >>>>> attribute-timeout value >> > > >>>>> > while mounting Gluster using Fuse? Say 60s each so those >> > > >>>>> > entries >> > > >>>>> will be >> > > >>>>> > purged periodically? >> > > >>>>> >> > > >>>>> Once the entry timeouts, inodes won't be purged. Kernel sends a >> > > >>>>> lookup >> > > >>>>> to revalidate the mapping of path to inode. AFAIK, reverse >> > > >>>>> invalidation >> > > >>>>> (see inode_invalidate) is the only way to make kernel forget >> > > >>>>> inodes/attributes. >> > > >>>>> >> > > >>>>> Is that something that can be done from the Fuse mount ? Or is >> > > >>>>> this >> > > >>>> something that needs to be added to Fuse? >> > > >>>> >> > > >>>>> > >> > > >>>>> > Regards, >> > > >>>>> > Nithya >> > > >>>>> > >> > > >>>>> >> > > >>>> >> > > >>>> >> > > >>> >> > > >> >> > > > >> > > >> > >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel@xxxxxxxxxxx >> http://lists.gluster.org/mailman/listinfo/gluster-devel > > > > > -- > Raghavendra G > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://lists.gluster.org/mailman/listinfo/gluster-devel _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-devel