Re: [bug?] blk_queue_may_bounce() has the comparison max_low_pfn and max_pfn wrong way

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Fri, 28 Oct 2022 20:14:58 +0100



On Fri, Oct 28, 2022 at 07:02:54PM +0100, Al Viro wrote:

> AFAICS, this condition is backwards - it should be
> 
> static inline bool blk_queue_may_bounce(struct request_queue *q)
> {
>         return IS_ENABLED(CONFIG_BOUNCE) &&
>                 q->limits.bounce == BLK_BOUNCE_HIGH &&
>                 max_low_pfn < max_pfn;
> }
> 
> What am I missing here?

More fun in that area:
	/*
	 * Bvec table can't be updated by bio_for_each_segment_all(),
	 * so retrieve bvec from the table directly. This way is safe
	 * because the 'bio' is single-page bvec.
	 */
	for (i = 0, to = bio->bi_io_vec; i < bio->bi_vcnt; to++, i++) {
		struct page *bounce_page;

		if (!PageHighMem(to->bv_page))
			continue;

		bounce_page = mempool_alloc(&page_pool, GFP_NOIO);
		inc_zone_page_state(bounce_page, NR_BOUNCE);

		if (rw == WRITE) {
			flush_dcache_page(to->bv_page);
			memcpy_from_bvec(page_address(bounce_page), to);
					 ^^^^^^^^^^^^^^^^^^^^^^^^^
		}
		to->bv_page = bounce_page;
	}

Consider the case when highmem page comes in bio_vec that covers the
second half of it.  We
	* allocate a bounce page
	* copy the second half of old page into the first half of new one
	* point the bio_vec to the second half of the new page.
	* submit the mangled bio.

While we are at it, the logics above that re splitting the bio before
bothering with bounces also looks somewhat fishy; if it triggers (which
needs > 256 elements in the original vector) we get bio split and
parts chained, then, assuming we run into a highmem page in each
half, we end up with

	bounce bio 1:	has ->bi_private pointing to bio 1, ->bi_end_io
			bounce_end_io_{read,write}().  Queued.
	bounce bio 2:	has ->bi_private pointing to bio 2, ->bi_end_io
			bounce_end_io_{read,write}().  Queued.
	bio 1:		original, covers the tail of original range.
	bio 2:		covers the beginning of original range,
			->bi_private points to bio 1, ->bi_end_io is
			bio_chain_endio().

Suppose the IO on bounce bio 2 fails.  We get ->bi_status of that sucker
set to non-zero.  ->bi_end_io() is called, leading to bounce_end_io(),
which will copy that status to bio 2 and call bio_endio() on bio 2.
Which will check ->bi_status on bio 1, see it zero and propagate the
error to bio 1.  Now bounce bio 1 completes without an error.
We still have zero in its ->bi_status and the call of bounce_end_io()
hits
        bio_orig->bi_status = bio->bi_status;
        bio_endio(bio_orig);
copying that zero to bio 1.  Oops - we'd just lost the error reported
by the IO on the other half...