On 3/30/23 08:36, Christoph Hellwig wrote: > On Wed, Mar 29, 2023 at 05:27:43PM +0900, Damien Le Moal wrote: >>> But why does this not follow the logic in __iomap_dio_rw to to return >>> -ENOTBLK for any error so that the write falls back to buffered I/O. >> >> This is a write to sequential zones so we cannot use buffered writes. We have to >> do a direct write to ensure ordering between writes. >> >> Note that this is the special blocking write case where we issue a zone append. >> For async regular writes, we use iomap so this bug does not exist. But then I >> now realize that __iomap_dio_rw() falling back to buffered IOs could also create >> an issue with write ordering. > > Can we add a comment please on why this is different? And maybe bundle > the iomap-using path fix into the series while you're at it. Not sure what you mean here. "iomap-using path fix" ? Do you mean adding a comment about the fact that zonefs does not fallback to doing buffered writes if the iomap_dio_rw() or zonefs dio append direct write fail ? > >>> Also as far as I can tell from reading the code, -1 is not a valid >>> end special case for invalidate_inode_pages2_range, so you'll actually >>> have to pass a valid end here. >> >> I wondered about that but then saw: >> >> int invalidate_inode_pages2(struct address_space *mapping) >> { >> return invalidate_inode_pages2_range(mapping, 0, -1); >> } >> EXPORT_SYMBOL_GPL(invalidate_inode_pages2); >> >> which tend to indicate that "-1" is fine. The end is passed to >> find_get_entries() -> find_get_entry() where it becomes a "max" pgoff_t, so >> using -1 seems fine. > > Oh, indeed. There's a little magic involved. Still, any reason not to > pass the real end like iomap? Simplicity: we write append only and so we know that the only cached page we can eventually hit is the one straddling inode->i_size. So invalidating everything from that page is safe, and simple. -- Damien Le Moal Western Digital Research