Re: [PATCH v5 11/24] xfs: add XBF_VERITY_SEEN xfs_buf flag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 12, 2024 at 01:04:24PM -0700, Darrick J. Wong wrote:
> On Tue, Mar 12, 2024 at 06:01:01PM +1100, Dave Chinner wrote:
> > On Mon, Mar 11, 2024 at 07:45:07PM -0700, Darrick J. Wong wrote:
> > > On Mon, Mar 11, 2024 at 08:25:05AM -0700, Darrick J. Wong wrote:
> > > > > But, if a generic blob cache is what it takes to move this forwards,
> > > > > so be it.
> > > > 
> > > > Not necessarily. ;)
> > > 
> > > And here's today's branch, with xfs_blobcache.[ch] removed and a few
> > > more cleanups:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/tag/?h=fsverity-cleanups-6.9_2024-03-11
> > 
> > Walking all the inodes counting all the verity blobs in the shrinker
> > is going to be -expensive-. Shrinkers are run very frequently and
> > with high concurrency under memory pressure by direct reclaim, and
> > every single shrinker execution is going to run that traversal even
> > if it is decided there is nothing that can be shrunk.
> > 
> > IMO, it would be better to keep a count of reclaimable objects
> > either on the inode itself (protected by the xa_lock when
> > adding/removing) to avoid needing to walk the xarray to count the
> > blocks on the inode. Even better would be a counter in the perag or
> > a global percpu counter in the mount of all caches objects. Both of
> > those pretty much remove all the shrinker side counting overhead.
> 
> I went with a global percpu counter, let's see if lockdep/kasan have
> anything to say about my new design. :P
> 
> > Couple of other small things.
> > 
> > - verity shrinker belongs in xfs_verity.c, not xfs_icache.c. It
> >   really has nothing to do with the icache other than calling
> >   xfs_icwalk(). That gets rid of some of the config ifdefs.
> 
> Done.
> 
> > - SHRINK_STOP is what should be returned by the scan when
> >   xfs_verity_shrinker_scan() wants the shrinker to immediately stop,
> >   not LONG_MAX.
> 
> Aha.  Ok, thanks for the tipoff. ;)
> 
> > - In xfs_verity_cache_shrink_scan(), the nr_to_scan is a count of
> >   how many object to try to free, not how many we must free. i.e.
> >   even if we can't free objects, they are still objects that got
> >   scanned and so should decement nr_to_scan...
> 
> <nod>
> 
> Is there any way for ->scan_objects to tell its caller how much memory
> it actually freed? Or does it only know about objects? 

The shrinker infrastructure itself only concerns itself with object
counts. It's almost impossible to balance memory used by slab caches
based on memory used because the objects are of different sizes.

For example, it's easy to acheive a 1:1 balance for dentry objects
and inode objects when shrinking is done by object count. Now try
getting that balance when individual shrinkers and the shrinker
infrastructure don't know the relative size of the objects in caches
it is trying to balance against. For shmem and fuse inode it is 4:1
size difference between a dentry and the inode, XFS inodes is 5:1,
ext4 it's 6:1, etc. Yet they all want the same 1:1 object count
balance with the dentry cache, and the same shrinker implementation
has to manage that...

IOws, there are two key scan values - a control value to tell the
shrinker scan how many objects to scan, and a return value that
tells the shrinker how many objects that specific scan freed.

> I suppose
> "number of bytes freed" wouldn't be that helpful since someone else
> could allocate all the freed memory immediately anyway.

The problem is that there isn't a direct realtionship between
"object freed" and "memory available for allocation" with slab cache
shrinkers. If you look at the back end of SLUB freeing, when it
frees a backing page for a cache because it is now empty (all
objects freed) it calls mm_account_reclaimed_pages() to account for
the page being freed. We do the same in xfs_buf_free_pages() to
account for the fact the object being freed also freed a bunch of
extra pages not directly accounted to the shrinker.

THis is the feedback to the memory reclaim process to determine how
much progress was made by the entire shrinker scan. Memory reclaim
is not actually caring how many objects were freed, it's using
hidden accounting to track how many pages actually got freed by the
overall 'every shrinker scan' call.

IOWs, object count base shrinking makes for realtively simple
cross-cache balance tuning, whilst actual memory freed accounting
by scans is hidden in the background...

In the case of this cache, it might be worthwhile adding something
like this for vmalloc()d objects:

	if (is_vmalloc_addr(ptr))
		mm_account_reclaimed_pages(how_many_pages(ptr))
	kvfree(ptr)

as I think that anything allocated from the SLUB heap by kvmalloc()
is already accounted to reclaim by the SLUB freeing code.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux