On 3/29/23 17:27, Damien Le Moal wrote: > On 3/29/23 17:14, Christoph Hellwig wrote: >> On Wed, Mar 29, 2023 at 02:58:23PM +0900, Damien Le Moal wrote: >>> + /* >>> + * If the inode block size (sector size) is smaller than the >>> + * page size, we may be appending data belonging to an already >>> + * cached last page of the inode. So make sure to invalidate that >>> + * last cached page. This will always be a no-op for the case where >>> + * the block size is equal to the page size. >>> + */ >>> + ret = invalidate_inode_pages2_range(inode->i_mapping, >>> + iocb->ki_pos >> PAGE_SHIFT, -1); >>> + if (ret) >>> + return ret; >> >> The missing truncate here obviously is a bug and needs fixing. >> >> But why does this not follow the logic in __iomap_dio_rw to to return >> -ENOTBLK for any error so that the write falls back to buffered I/O. > > This is a write to sequential zones so we cannot use buffered writes. We have to > do a direct write to ensure ordering between writes. > > Note that this is the special blocking write case where we issue a zone append. > For async regular writes, we use iomap so this bug does not exist. But then I > now realize that __iomap_dio_rw() falling back to buffered IOs could also create > an issue with write ordering. Checking this, there are no issues as it is the FS caller of iomap_dio_rw() who has to fallback to buffered IO if it wants to. But zonefs does not do that. > >> Also as far as I can tell from reading the code, -1 is not a valid >> end special case for invalidate_inode_pages2_range, so you'll actually >> have to pass a valid end here. > > I wondered about that but then saw: > > int invalidate_inode_pages2(struct address_space *mapping) > { > return invalidate_inode_pages2_range(mapping, 0, -1); > } > EXPORT_SYMBOL_GPL(invalidate_inode_pages2); > > which tend to indicate that "-1" is fine. The end is passed to > find_get_entries() -> find_get_entry() where it becomes a "max" pgoff_t, so > using -1 seems fine. > > -- Damien Le Moal Western Digital Research