Re: Semantics of racy O_DIRECT writes

Christoph Hellwig <hch@xxxxxxxxxxxxx> · Fri, 10 Jan 2025 00:58:19 -0800

On Thu, Jan 09, 2025 at 10:51:19AM -0500, Theodore Ts'o wrote:
> For Linux, if the block device is one that requires stable writes
> (e.g., for iSCSI writes which include a checksum, or SCSI devices with
> DIF/DIX enabled, or some software RAID 5 block device), where a racy
> write might lead to an I/O error on the write or in the case of RAID
> 5, in the subsequent read of the block, Linux will protect against
> this happening by marking the page read-only while the I/O is
> underway, either if it's happening via buffered writeback or O_DIRECT
> writes, and then marking the page read/write afterwards.

This only happens for buffered I/O, and not for direct I/O.

But that only matters when your operation is inside the sector (LBA)
boundary that the device interface operates on, e.g. if you using 512
byte sector size as long your stay outside of that you're still fine.

BUT: that assumes device checksums.  File systems can have checksums
as well and have the same problem.  Because of that for example running
Windows VM images which tend to somehow generate this pattern on qemu
using direct I/O on btrfs files has historically causes a lot of
problems.