Re: [PATCH RFC 5/7] fs: iomap: buffered atomic write support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 22/04/2024 16:03, Matthew Wilcox wrote:
On Mon, Apr 22, 2024 at 02:39:21PM +0000, John Garry wrote:
Add special handling of PG_atomic flag to iomap buffered write path.

To flag an iomap iter for an atomic write, set IOMAP_ATOMIC.

For a folio associated with a write which has IOMAP_ATOMIC set, set
PG_atomic.

Otherwise, when IOMAP_ATOMIC is unset, clear PG_atomic.

This means that for an "atomic" folio which has not been written back, it
loses it "atomicity". So if userspace issues a write with RWF_ATOMIC set
and another write with RWF_ATOMIC unset and which fully or partially
overwrites that same region as the first write, that folio is not written
back atomically. For such a scenario to occur, it would be considered a
userspace usage error.

To ensure that a buffered atomic write is written back atomically when
the write syscall returns, RWF_SYNC or similar needs to be used (in
conjunction with RWF_ATOMIC).

As a safety check, when getting a folio for an atomic write in
iomap_get_folio(), ensure that the length matches the inode mapping folio
order-limit.

Only a single BIO should ever be submitted for an atomic write. So modify
iomap_add_to_ioend() to ensure that we don't try to write back an atomic
folio as part of a larger mixed-atomicity BIO.

In iomap_alloc_ioend(), handle an atomic write by setting REQ_ATOMIC for
the allocated BIO.

When a folio is written back, again clear PG_atomic, as it is no longer
required. I assume it will not be needlessly written back a second time...

I'm not taking a position on the mechanism yet; need to think about it
some more.  But there's a hole here I also don't have a solution to,
so we can all start thinking about it.

In iomap_write_iter(), we call copy_folio_from_iter_atomic().  Through no
fault of the application, if the range crosses a page boundary, we might
partially copy the bytes from the first page, then take a page fault on
the second page, hence doing a short write into the folio.  And there's
nothing preventing writeback from writing back a partially copied folio.

Now, if it's not dirty, then it can't be written back.  So if we're
doing an atomic write, we could clear the dirty bit after calling
iomap_write_begin() (given the usage scenarios we've discussed, it should
always be clear ...)
> We need to prevent the "fall back to a short copy" logic in
iomap_write_iter() as well.  But then we also need to make sure we don't
get stuck in a loop, so maybe go three times around, and if it's still
not readable as a chunk, -EFAULT?
This idea sounds reasonable. So at what stage would the dirty flag be set? Would it be only when all bytes are copied successfully as a single chunk?

FWIW, we do have somewhat equivalent handling in direct IO path, being that if the iomap iter loops more than once such that we will need to create > 1 bio in the DIO bio submission handler, then we -EINVAL as something has gone wrong. But that's not so relevant here.

Thanks,
John




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux