On Tue, May 14, 2024 at 08:34:09PM -0600, Keith Busch wrote: > On Wed, May 15, 2024 at 01:50:53AM +0100, Matthew Wilcox wrote: > > On Tue, May 07, 2024 at 04:58:12PM +0200, Pankaj Raghav (Samsung) wrote: > > > Instead of looping with ZERO_PAGE, use a huge zero folio to zero pad the > > > block. Fallback to ZERO_PAGE if mm_get_huge_zero_folio() fails. > > > > So the block people say we're doing this all wrong. We should be > > issuing a REQ_OP_WRITE_ZEROES bio, and the block layer will take care of > > using the ZERO_PAGE if the hardware doesn't natively support > > WRITE_ZEROES or a DISCARD that zeroes or ... > > Wait a second, I think you've gone too far if you're setting the bio op > to REQ_OP_WRITE_ZEROES. The block layer handles the difference only > through the blkdev_issue_zeroout() helper. If you actually submit a bio > with that op to a block device that doesn't support it, you'll just get > a BLK_STS_NOTSUPP error from submit_bio_noacct(). Ohh. This is a bit awkward, because this is the iomap direct IO path. I don't see an obvious way to get the semantics we want with the current blkdev_issue_zeroout(). For reference, here's the current function: static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, loff_t pos, unsigned len) { struct inode *inode = file_inode(dio->iocb->ki_filp); struct page *page = ZERO_PAGE(0); struct bio *bio; bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits, GFP_KERNEL); bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos); bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; __bio_add_page(bio, page, len, 0); iomap_dio_submit_bio(iter, dio, bio, pos); } and then: static void iomap_dio_submit_bio(const struct iomap_iter *iter, struct iomap_dio *dio, struct bio *bio, loff_t pos) { struct kiocb *iocb = dio->iocb; atomic_inc(&dio->ref); /* Sync dio can't be polled reliably */ if ((iocb->ki_flags & IOCB_HIPRI) && !is_sync_kiocb(iocb)) { bio_set_polled(bio, iocb); WRITE_ONCE(iocb->private, bio); } if (dio->dops && dio->dops->submit_io) dio->dops->submit_io(iter, bio, pos); else submit_bio(bio); } so unless submit_bio() can handle the fallback to "create a new bio full of zeroes and resubmit it to the device" if the original fails, we're a little mismatched. I'm not really familiar with either part of this code, so I don't have much in the way of bright ideas. Perhaps we go back to the "allocate a large folio at filesystem mount" plan.