On Thu, Aug 22, 2019 at 10:50:02AM +0800, Ming Lei wrote: > On Thu, Aug 22, 2019 at 8:06 AM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > > > > On Wed, Aug 21, 2019 at 06:38:20PM +1000, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > > > Add memory buffer alignment validation checks to bios built in XFS > > > to catch bugs that will result in silent data corruption in block > > > drivers that cannot handle unaligned memory buffers but don't > > > validate the incoming buffer alignment is correct. > > > > > > Known drivers with these issues are xenblk, brd and pmem. > > > > > > Despite there being nothing XFS specific to xfs_bio_add_page(), this > > > function was created to do the required validation because the block > > > layer developers that keep telling us that is not possible to > > > validate buffer alignment in bio_add_page(), and even if it was > > > possible it would be too much overhead to do at runtime. > > > > I really don't think we should life this to XFS, but instead fix it > > in the block layer. And that is not only because I have a pending > > series lifting bits you are touching to the block layer.. > > > > > +int > > > +xfs_bio_add_page( > > > + struct bio *bio, > > > + struct page *page, > > > + unsigned int len, > > > + unsigned int offset) > > > +{ > > > + struct request_queue *q = bio->bi_disk->queue; > > > + bool same_page = false; > > > + > > > + if (WARN_ON_ONCE(!blk_rq_aligned(q, len, offset))) > > > + return -EIO; > > > + > > > + if (!__bio_try_merge_page(bio, page, len, offset, &same_page)) { > > > + if (bio_full(bio, len)) > > > + return 0; > > > + __bio_add_page(bio, page, len, offset); > > > + } > > > + return len; > > > > I know Jens disagree, but with the amount of bugs we've been hitting > > thangs to slub (and I'm pretty sure we have a more hiding outside of > > XFS) I think we need to add the blk_rq_aligned check to bio_add_page. > > It isn't correct to blk_rq_aligned() here because 'len' has to be logical block > size aligned, instead of DMA aligned only. News to me. AFAIA, the overall _IO_ that is being built needs to be a multiple of the logical block size in total size (i.e. bio->bi_iter.size) because sub sector IO is not allowed. But queue DMA limits are not defined in sectors - they define the scatter/gather DMA capability of the hardware, and that's what individual segments (bvecs) need to align to. That's what blk_rq_aligned() checks here - that the bvec segment aligns to what the underlying driver(s) requires, not that the entire IO is sector sized and aligned. Also, think about multipage bvecs - the pages we are spanning here are contiguous pages, so this should end up merging them and turning it into a single multipage bvec whose length is sector size aligned... > Also not sure all users may setup bio->bi_disk well before adding page to bio, > since it is allowed to do that now. XFS does, so I just don't care about random users of bio_add_page() in this patch. Somebody else can run the block layer gauntlet to get these checks moved into generic code and they've already been rejected twice as unnecessary. > If slub buffer crosses two pages, block layer may not handle it at all > even though > un-aligned 'offset' issue is solved. A slub buffer crossing two _contiguous_ pages should end up merged as a multipage bvec. But I'm curious, what does adding multiple contiguous pages to a bio actually break? -Dave. -- Dave Chinner david@xxxxxxxxxxxxx