On Fri, Jan 31, 2025 at 05:06:50PM -0300, Travis Downs wrote: > On Fri, Jan 10, 2025 at 5:58 AM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > > > > On Thu, Jan 09, 2025 at 10:51:19AM -0500, Theodore Ts'o wrote: > > > For Linux, if the block device is one that requires stable writes > > > (e.g., for iSCSI writes which include a checksum, or SCSI devices with > > > DIF/DIX enabled, or some software RAID 5 block device), where a racy > > > write might lead to an I/O error on the write or in the case of RAID > > > 5, in the subsequent read of the block, Linux will protect against > > > this happening by marking the page read-only while the I/O is > > > underway, either if it's happening via buffered writeback or O_DIRECT > > > writes, and then marking the page read/write afterwards. > > > > This only happens for buffered I/O, and not for direct I/O. > > Thank you. To clarify, "this" means the RO protection, right? So in direct IO > there is no such protection? Yes. > > But that only matters when your operation is inside the sector (LBA) > > boundary that the device interface operates on, e.g. if you using 512 > > byte sector size as long your stay outside of that you're still fine. > > Sorry it's not clear if you are talking about the buffered or direct > I/O case here. This is all about direct I/O. > > BUT: that assumes device checksums. File systems can have checksums > > as well and have the same problem. Because of that for example running > > Windows VM images which tend to somehow generate this pattern on qemu > > using direct I/O on btrfs files has historically causes a lot of > > problems. > > So is it fair to say that for direct IO these types of racy writes are not safe? In general: yes. > > Specifically, we are looking at behavior in a 3rd party, proprietary > block device > (implemented as a kernel module) and are wondering if these types of racy > writes break the implied or explicit semantics of safe direct IO writes. I have no interest in helping anyone into looking proprietary drivers. But every single one I've looked at was somewhere between somewhat to totally broken in many way.