On Mon, Mar 11, 2024 at 11:26:24AM +1100, Dave Chinner wrote: > On Sat, Mar 09, 2024 at 08:28:28AM -0800, Darrick J. Wong wrote: > > On Thu, Mar 07, 2024 at 07:31:38PM -0800, Darrick J. Wong wrote: > > > On Fri, Mar 08, 2024 at 12:59:56PM +1100, Dave Chinner wrote: > > > > > (Ab)using the fsbuf code did indeed work (and passed all the fstests -g > > > > > verity tests), so now I know the idea is reasonable. Patches 11, 12, > > > > > 14, and 15 become unnecessary. However, this solution is itself grossly > > > > > overengineered, since all we want are the following operations: > > > > > > > > > > peek(key): returns an fsbuf if there's any data cached for key > > > > > > > > > > get(key): returns an fsbuf for key, regardless of state > > > > > > > > > > store(fsbuf, p): attach a memory buffer p to fsbuf > > > > > > > > > > Then the xfs ->read_merkle_tree_block function becomes: > > > > > > > > > > bp = peek(key) > > > > > if (bp) > > > > > /* return bp data up to verity */ > > > > > > > > > > p = xfs_attr_get(key) > > > > > if (!p) > > > > > /* error */ > > > > > > > > > > bp = get(key) > > > > > store(bp, p) > > > > > > > > Ok, that looks good - it definitely gets rid of a lot of the > > > > nastiness, but I have to ask: why does it need to be based on > > > > xfs_bufs? > > > > > > (copying from IRC) It was still warm in my brain L2 after all the xfile > > > buftarg cleaning and merging that just got done a few weeks ago. So I > > > went with the simplest thing I could rig up to test my ideas, and now > > > we're at the madly iterate until exhaustion stage. ;) > > > > > > > That's just wasting 300 bytes of memory on a handle to > > > > store a key and a opaque blob in a rhashtable. > > > > > > Yep. The fsbufs implementation was a lot more slender, but a bunch more > > > code. I agree that I ought to go look at xarrays or something that's > > > more of a direct mapping as a next step. However, i wanted to get > > > Andrey's feedback on this general approach first. > > > > > > > IIUC, the key here is a sequential index, so an xarray would be a > > > > much better choice as it doesn't require internal storage of the > > > > key. > > > > > > I wonder, what are the access patterns for merkle blobs? Is it actually > > > sequential, or is more like 0 -> N -> N*N as we walk towards leaves? > > I think the leaf level (i.e. individual record) access patterns > largely match data access patterns, so I'd just treat it like as if > it's a normal file being accessed.... <nod> The latest version of this tries to avoid letting reclaim take the top of the tree. Logically this makes sense to me to reduce read verify latency, but I was hoping Eric or Andrey or someone with more familiarity with fsverity would chime in on whether or not that made sense. > > > Also -- the fsverity block interfaces pass in a "u64 pos" argument. Was > > > that done because merkle trees may some day have more than 2^32 blocks > > > in them? That won't play well with things like xarrays on 32-bit > > > machines. > > > > > > (Granted we've been talking about deprecating XFS on 32-bit for a while > > > now but we're not the whole world) > > > > > > > i.e. > > > > > > > > p = xa_load(key); > > > > if (p) > > > > return p; > > > > > > > > xfs_attr_get(key); > > > > if (!args->value) > > > > /* error */ > > > > > > > > /* > > > > * store the current value, freeing any old value that we > > > > * replaced at this key. Don't care about failure to store, > > > > * this is optimistic caching. > > > > */ > > > > p = xa_store(key, args->value, GFP_NOFS); > > > > if (p) > > > > kvfree(p); > > > > return args->value; > > > > > > Attractive. Will have to take a look at that tomorrow. > > > > Done. I think. Not sure that I actually got all the interactions > > between the shrinker and the xarray correct though. KASAN and lockdep > > don't have any complaints running fstests, so that's a start. > > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=fsverity-cleanups-6.9_2024-03-09 > > My initial impression is "over-engineered". Overly focused on unfamiliar data structures -- I've never built anything with an xarray before. > I personally would have just allocated the xattr value buffer with a > little extra size and added all the external cache information (a > reference counter is all we need as these are fixed sized blocks) to > the tail of the blob we actually pass to fsverity. Oho, that's a good idea. I didn't like the separate allocation anyway. Friday was mostly chatting with willy and trying to make sure I got the xarray access patterns correct. > If we tag the > inode in the radix tree as having verity blobs that can be freed, we > can then just extend the existing fs sueprblock shrinker callout to > also walk all the verity inodes with cached data to try to reclaim > some objects... This too is a wonderful suggestion -- use the third radix tree tag to mark inodes with extra incore caches that can be reclaimed, then teach xfs_reclaim_inodes_{nr,count} to scan them. Allocating a per-inode shrinker was likely to cause problems with flooding debugfs with too many knobs anyway. > But, if a generic blob cache is what it takes to move this forwards, > so be it. Not necessarily. ;) --D > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx >