Re: [RFC] iomap: use huge zero folio in iomap_dio_zero

Christoph Hellwig <hch@xxxxxx> · Wed, 15 May 2024 13:48:50 +0200

On Wed, May 15, 2024 at 01:50:53AM +0100, Matthew Wilcox wrote:
> On Tue, May 07, 2024 at 04:58:12PM +0200, Pankaj Raghav (Samsung) wrote:
> > Instead of looping with ZERO_PAGE, use a huge zero folio to zero pad the
> > block. Fallback to ZERO_PAGE if mm_get_huge_zero_folio() fails.
> 
> So the block people say we're doing this all wrong.  We should be
> issuing a REQ_OP_WRITE_ZEROES bio, and the block layer will take care of
> using the ZERO_PAGE if the hardware doesn't natively support
> WRITE_ZEROES or a DISCARD that zeroes or ...

Not sure who "the block people" are, but while this sounds smart
it actually is a really bad idea.

Think about what we are doing here, we zero parts of a file system
block as part of a direct I/O write operation.  So the amount is
relatively small and it is part of a fast path I/O operation.  It
also will most likely land on the indirection entry on the device.

If you use a write zeroes it will go down a separate slow path in
the device instead of using the highly optimized write path and
slow the whole operation down.  Even worse there are chances that
it will increase write amplification because there are two separate
operations now instead of one merged one (either a block layer or
device merge).

And I'm not sure what "block layer person" still doesn't understand
that discard do not zero data, but maybe we'll need yet another
education campaign there.