Re: [PATCH v4 23/33] afs: Use netfslib for directories

Kees Bakker <kees@xxxxxxxxxxxx> · Fri, 15 Nov 2024 21:32:02 +0100

Op 08-11-2024 om 18:32 schreef David Howells:
In the AFS ecosystem, directories are just a special type of file that is
downloaded and parsed locally.  Download is done by the same mechanism as
ordinary files and the data can be cached.  There is one important semantic
restriction on directories over files: the client must download the entire
directory in one go because, for example, the server could fabricate the
contents of the blob on the fly with each download and give a different
image each time.

So that we can cache the directory download, switch AFS directory support
over to using the netfslib single-object API, thereby allowing directory
content to be stored in the local cache.

To make this work, the following changes are made:

  (1) A directory's contents are now stored in a folio_queue chain attached
      to the afs_vnode (inode) struct rather than its associated pagecache,
      though multipage folios are still used to hold the data.  The folio
      queue is discarded when the directory inode is evicted.

      This also helps with the phasing out of ITER_XARRAY.

  (2) Various directory operations are made to use and unuse the cache
      cookie.

  (3) The content checking, content dumping and content iteration are now
      performed with a standard iov_iter iterator over the contents of the
      folio queue.

  (4) Iteration and modification must be done with the vnode's validate_lock
      held.  In conjunction with (1), this means that the iteration can be
      done without the need to lock pages or take extra refs on them, unlike
      when accessing ->i_pages.

  (5) Convert to using netfs_read_single() to read data.

  (6) Provide a ->writepages() to call netfs_writeback_single() to save the
      data to the cache according to the VM's scheduling whilst holding the
      validate_lock read-locked as (4).

  (7) Change local directory image editing functions:

      (a) Provide a function to get a specific block by number from the
      	 folio_queue as we can no longer use the i_pages xarray to locate
      	 folios by index.  This uses a cursor to remember the current
      	 position as we need to iterate through the directory contents.
      	 The block is kmapped before being returned.

      (b) Make the function in (a) extend the directory by an extra folio if
      	 we run out of space.

      (c) Raise the check of the block free space counter, for those blocks
      	 that have one, higher in the function to eliminate a call to get a
      	 block.

      (d) Remove the page unlocking and putting done during the editing
      	 loops.  This is no longer necessary as the folio_queue holds the
      	 references and the pages are no longer in the pagecache.

      (e) Mark the inode dirty and pin the cache usage till writeback at the
      	 end of a successful edit.

  (8) Don't set the large_folios flag on the inode as we do the allocation
      ourselves rather than the VM doing it automatically.

  (9) Mark the inode as being a single object that isn't uploaded to the
      server.

(10) Enable caching on directories.

(11) Only set the upload key for writeback for regular files.

Notes:

  (*) We keep the ->release_folio(), ->invalidate_folio() and
      ->migrate_folio() ops as we set the mapping pointer on the folio.

Signed-off-by: David Howells <dhowells@xxxxxxxxxx>
cc: Marc Dionne <marc.dionne@xxxxxxxxxxxx>
cc: Jeff Layton <jlayton@xxxxxxxxxx>
cc: linux-afs@xxxxxxxxxxxxxxxxxxx
cc: netfs@xxxxxxxxxxxxxxx
cc: linux-fsdevel@xxxxxxxxxxxxxxx
---
  fs/afs/dir.c               | 742 +++++++++++++++++++------------------
  fs/afs/dir_edit.c          | 183 ++++-----
  fs/afs/file.c              |   8 +
  fs/afs/inode.c             |  21 +-
  fs/afs/internal.h          |  16 +
  fs/afs/super.c             |   2 +
  fs/afs/write.c             |   4 +-
  include/trace/events/afs.h |   6 +-
  8 files changed, 512 insertions(+), 470 deletions(-)

[...]
+/*
+ * Iterate through the directory folios under RCU conditions.
+ */
+static int afs_dir_iterate_contents(struct inode *dir, struct dir_context *ctx)
+{
+	struct afs_vnode *dvnode = AFS_FS_I(dir);
+	struct iov_iter iter;
+	unsigned long long i_size = i_size_read(dir);
+	int ret = 0;

-		do {
-			dblock = kmap_local_folio(folio, offset);
-			ret = afs_dir_iterate_block(dvnode, ctx, dblock,
-						    folio_pos(folio) + offset);
-			kunmap_local(dblock);
-			if (ret != 1)
-				goto out;
+	/* Round the file position up to the next entry boundary */
+	ctx->pos = round_up(ctx->pos, sizeof(union afs_xdr_dirent));

-		} while (offset += sizeof(*dblock), offset < size);
+	if (i_size <= 0 || ctx->pos >= i_size)
+		return 0;

-		ret = 0;
-	}
+	iov_iter_folio_queue(&iter, ITER_SOURCE, dvnode->directory, 0, 0, i_size);
+	iov_iter_advance(&iter, round_down(ctx->pos, AFS_DIR_BLOCK_SIZE));
+
+	iterate_folioq(&iter, iov_iter_count(&iter), dvnode, ctx,
+		       afs_dir_iterate_step);
+
+	if (ret == -ESTALE)
This is dead code because `ret` is set to 0 and never changed.
+		afs_invalidate_dir(dvnode, afs_dir_invalid_iter_stale);
+	return ret;
+}
[...]