Re: [RFC] iomap: use huge zero folio in iomap_dio_zero

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 14, 2024 at 08:34:09PM -0600, Keith Busch wrote:
> On Wed, May 15, 2024 at 01:50:53AM +0100, Matthew Wilcox wrote:
> > On Tue, May 07, 2024 at 04:58:12PM +0200, Pankaj Raghav (Samsung) wrote:
> > > Instead of looping with ZERO_PAGE, use a huge zero folio to zero pad the
> > > block. Fallback to ZERO_PAGE if mm_get_huge_zero_folio() fails.
> > 
> > So the block people say we're doing this all wrong.  We should be
> > issuing a REQ_OP_WRITE_ZEROES bio, and the block layer will take care of
> > using the ZERO_PAGE if the hardware doesn't natively support
> > WRITE_ZEROES or a DISCARD that zeroes or ...
> 
> Wait a second, I think you've gone too far if you're setting the bio op
> to REQ_OP_WRITE_ZEROES. The block layer handles the difference only
> through the blkdev_issue_zeroout() helper. If you actually submit a bio
> with that op to a block device that doesn't support it, you'll just get
> a BLK_STS_NOTSUPP error from submit_bio_noacct().

Ohh.  This is a bit awkward, because this is the iomap direct IO path.
I don't see an obvious way to get the semantics we want with the current
blkdev_issue_zeroout().  For reference, here's the current function:

static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
                loff_t pos, unsigned len)
{
        struct inode *inode = file_inode(dio->iocb->ki_filp);
        struct page *page = ZERO_PAGE(0);
        struct bio *bio;

        bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE);
        fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits,
                                  GFP_KERNEL);
        bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos);
        bio->bi_private = dio;
        bio->bi_end_io = iomap_dio_bio_end_io;

        __bio_add_page(bio, page, len, 0);
        iomap_dio_submit_bio(iter, dio, bio, pos);
}

and then:

static void iomap_dio_submit_bio(const struct iomap_iter *iter,
                struct iomap_dio *dio, struct bio *bio, loff_t pos)
{
        struct kiocb *iocb = dio->iocb;

        atomic_inc(&dio->ref);

        /* Sync dio can't be polled reliably */
        if ((iocb->ki_flags & IOCB_HIPRI) && !is_sync_kiocb(iocb)) {
                bio_set_polled(bio, iocb);
                WRITE_ONCE(iocb->private, bio);
        }

        if (dio->dops && dio->dops->submit_io)
                dio->dops->submit_io(iter, bio, pos);
        else
                submit_bio(bio);
}

so unless submit_bio() can handle the fallback to "create a new bio
full of zeroes and resubmit it to the device" if the original fails,
we're a little mismatched.  I'm not really familiar with either part of
this code, so I don't have much in the way of bright ideas.  Perhaps
we go back to the "allocate a large folio at filesystem mount" plan.




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux