Re: [RFC PATCH 09/11] iomap: fs-verity verification on page read

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 13, 2022 at 06:29:33PM +0100, Andrey Albershteyn wrote:
> Add fs-verity page verification in read IO path. The verification
> itself is offloaded into workqueue (provided by fs-verity).
> 
> The work_struct items are allocated from bioset side by side with
> bio being processed.
> 
> As inodes with fs-verity doesn't use large folios we check only
> first page of the folio for errors (set by fs-verity if verification
> failed).
> 
> Signed-off-by: Andrey Albershteyn <aalbersh@xxxxxxxxxx>
> ---
>  fs/iomap/buffered-io.c | 80 +++++++++++++++++++++++++++++++++++++++---
>  include/linux/iomap.h  |  5 +++
>  2 files changed, 81 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 91ee0b308e13d..b7abc2f806cfc 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -17,6 +17,7 @@
>  #include <linux/bio.h>
>  #include <linux/sched/signal.h>
>  #include <linux/migrate.h>
> +#include <linux/fsverity.h>
>  #include "trace.h"
>  
>  #include "../internal.h"
> @@ -42,6 +43,7 @@ static inline struct iomap_page *to_iomap_page(struct folio *folio)
>  }
>  
>  static struct bio_set iomap_ioend_bioset;
> +static struct bio_set iomap_readend_bioset;
>  
>  static struct iomap_page *
>  iomap_page_create(struct inode *inode, struct folio *folio, unsigned int flags)
> @@ -189,9 +191,39 @@ static void iomap_read_end_io(struct bio *bio)
>  	int error = blk_status_to_errno(bio->bi_status);
>  	struct folio_iter fi;
>  
> -	bio_for_each_folio_all(fi, bio)
> +	bio_for_each_folio_all(fi, bio) {
> +		/*
> +		 * As fs-verity doesn't work with multi-page folios, verity
> +		 * inodes have large folios disabled (only single page folios
> +		 * are used)
> +		 */
> +		if (!error)
> +			error = PageError(folio_page(fi.folio, 0));
> +
>  		iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error);
> +	}
> +
>  	bio_put(bio);
> +	/* The iomap_readend has been freed by bio_put() */
> +}
> +
> +static void iomap_read_work_end_io(
> +	struct work_struct *work)
> +{
> +	struct iomap_readend *ctx =
> +		container_of(work, struct iomap_readend, read_work);
> +	struct bio *bio = &ctx->read_inline_bio;
> +
> +	fsverity_verify_bio(bio);
> +	iomap_read_end_io(bio);
> +}
> +
> +static void iomap_read_work_io(struct bio *bio)
> +{
> +	struct iomap_readend *ctx =
> +		container_of(bio, struct iomap_readend, read_inline_bio);
> +
> +	fsverity_enqueue_verify_work(&ctx->read_work);
>  }

Ok, so fsverity simple queues this to a global workqueue it has set
up as WQ_UNBOUND | WQ_HIGHPRI with, effectively, one worker thread
per CPU. This will work for simple cases and to get the basic
infrastructure in place, but there are problems here that we will
need to address.

1. High priority workqueues are used within XFS to ensure that data
IO completion cannot stall processing of journal IO completions. We
use them to prevent IO priority inversion deadlocks.

That is, journal IO completions use a WQ_HIGHPRI workqueue to ensure
that they are scheduled ahead of data IO completions as data IO
completions may require journal IO to complete before they can make
progress (e.g. to ensure transaction reservations in IO completion
can make progress).

Hence using a WQ_HIGHPRI workqueue directly in the user data IO path
is a potential filesystem livelock/deadlock vector. At best, it's
going to impact on the latency of journal commits and hence anything
that is running data integrity operations whilst fsverity read
operations are in progress.

The second thing is that the fsverity workqueue is global - it
creates a cross-filesystem contention point. That means a single
busy fsverity filesystem will create unpredictable IO latency for
filesystems that only use fsverity sparingly. i.e. heavy amounts of
fsverity work on one filesystem will impact the performance of
journal and other IO operations on all other active filesystems due
to fsverity using WQ_HIGHPRI.

This sort of workqueue setup is typically fine for a single user
device that has lots of otherwise idle CPU that can be co-opted for
create concurrency in IO completion work where there would otherwise
been none (e.g. an Android phone).

However, it is less than ideal for a high concurrent application
using AIO or io_uring to push a couple of million IOPS through the
filesystem. In these sitations, we don't want IO completion work to
be spread outi because there is no idle CPU for them to be run on.
Instead, locality is king - we want them run on the same CPU that
already has the inode, bio, and other objects hot in the cache.

So, really, for XFS we want per-filesystem, locality preserving
per-cpu workqueues for fsverity work as we get IO concurrency (and
therefore fsverity work concurrency) from the application IO
patterns. And it avoids all potential issues with using WQ_HIGHPRI
for journal IO to avoid data IO completion deadlocks.


>  struct iomap_readpage_ctx {
> @@ -264,6 +296,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
>  	loff_t orig_pos = pos;
>  	size_t poff, plen;
>  	sector_t sector;
> +	struct iomap_readend *readend;
>  
>  	if (iomap->type == IOMAP_INLINE)
>  		return iomap_read_inline_data(iter, folio);
> @@ -276,7 +309,21 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
>  
>  	if (iomap_block_needs_zeroing(iter, pos)) {
>  		folio_zero_range(folio, poff, plen);
> -		iomap_set_range_uptodate(folio, iop, poff, plen);
> +		if (!fsverity_active(iter->inode)) {
> +			iomap_set_range_uptodate(folio, iop, poff, plen);
> +			goto done;
> +		}
> +
> +		/*
> +		 * As fs-verity doesn't work with folios sealed inodes have
> +		 * multi-page folios disabled and we can check on first and only
> +		 * page
> +		 */
> +		if (fsverity_verify_page(folio_page(folio, 0)))
> +			iomap_set_range_uptodate(folio, iop, poff, plen);
> +		else
> +			folio_set_error(folio);

This makes me wonder if this should be a iomap->page_op method
rather than calling fsverity code directly. i.e. if the filesystem
knows that it fsverity is enabled on the inode during it's
iomap_begin() callout, and that state *must not change* until the IO
is complete. Hence it can set a iomap flag saying
IOMAP_F_READ_VERIFY and add a page_ops vector with a "verify_folio"
callout. Then this code can do:

	if (iomap_block_needs_zeroing(iter, pos)) {
		folio_zero_range(folio, poff, plen);
		if (iomap->flags & IOMAP_F_READ_VERIFY) {
			if (!iomap->page_ops->verify_folio(folio, poff, plen)) {
				folio_set_error(folio);
				goto done;
			}
		}
		iomap_set_range_uptodate(folio, iop, poff, plen);
		got done;
	}

> @@ -297,8 +344,18 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
>  
>  		if (ctx->rac) /* same as readahead_gfp_mask */
>  			gfp |= __GFP_NORETRY | __GFP_NOWARN;
> -		ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
> +		if (fsverity_active(iter->inode)) {
> +			ctx->bio = bio_alloc_bioset(iomap->bdev,
> +					bio_max_segs(nr_vecs), REQ_OP_READ,
> +					GFP_NOFS, &iomap_readend_bioset);
> +			readend = container_of(ctx->bio,
> +					struct iomap_readend,
> +					read_inline_bio);
> +			INIT_WORK(&readend->read_work, iomap_read_work_end_io);
> +		} else {
> +			ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
>  				     REQ_OP_READ, gfp);
> +		}
>  		/*
>  		 * If the bio_alloc fails, try it again for a single page to
>  		 * avoid having to deal with partial page reads.  This emulates
> @@ -311,7 +368,11 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
>  		if (ctx->rac)
>  			ctx->bio->bi_opf |= REQ_RAHEAD;
>  		ctx->bio->bi_iter.bi_sector = sector;
> -		ctx->bio->bi_end_io = iomap_read_end_io;
> +		if (fsverity_active(iter->inode))
> +			ctx->bio->bi_end_io = iomap_read_work_io;
> +		else
> +			ctx->bio->bi_end_io = iomap_read_end_io;


Hmmm. OK, why not just always use the iomap_readend_bioset, put
a flags field in it and then do this check in iomap_read_end_io()?

i.e.

struct iomap_read_ioend {
	bool			queue_work;
	struct work_struct	work;	/* post read work (fs-verity) */
	struct bio		inline_bio;/* MUST BE LAST! */
};


Then always allocate from the iomap_read_ioend_bioset, and do:

		ctx->bio->bi_end_io = iomap_read_end_io;
		if (iomap->flags & IOMAP_F_VERITY) {
			read_ioend->queue_work = true;
			INIT_WORK(&read_ioend->work, iomap_read_ioend_work);
		}

The in iomap_read_end_io():

	if (ioend->queue_work) {
		fsverity_enqueue_verify_work(&ctx->work);
		return;
	}
	iomap_read_end_io(bio);

And if we put a fs supplied workqueue into the iomap as well, then
we have a mechanism for the filesystem to supply it's own workqueue
for fsverity, fscrypt, or any other read-side post-IO data processing
functions filesystems care to add.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux