On Wed 06-11-24 11:59:44, Dan Williams wrote: > Jan Kara wrote: > [..] > > > This WARN still feels like the wrong thing, though. Right now it is the > > > only thing in DAX code complaining on a page size/block size mismatch > > > (at least for virtiofs). If this is so important, I feel like there > > > should be a higher level check elsewhere, like something happening at > > > mount time or on file open. It should actually cause the operations to > > > fail cleanly. > > > > That's a fair point. Currently filesystems supporting DAX check for this in > > their mount code because there isn't really a DAX code that would get > > called during mount and would have enough information to perform the check. > > I'm not sure adding a new call just for this check makes a lot of sense. > > But if you have some good place in mind, please tell me. > > Is not the reason that dax_writeback_mapping_range() the only thing > checking ->i_blkbits because 'struct writeback_control' does writeback > in terms of page-index ranges? To be fair, I don't remember why we've put the assertion specifically into dax_writeback_mapping_range(). But as Dave explained there's much more to this blocksize == pagesize limitation in DAX than just doing writeback in terms of page-index ranges. The whole DAX entry tracking in xarray would have to be modified to properly support other entry sizes than just PTE & PMD sizes because otherwise the entry locking just doesn't provide the guarantees that are expected from filesystems (e.g. you could have parallel modifications happening to a single fs block in pagesize < blocksize case). > All other dax entry points are filesystem controlled that know the > block-to-pfn-to-mapping relationship. > > Recall that dax_writeback_mapping_range() is historically for pmem > persistence guarantees to make sure that applications write through CPU > cache to media. Correct. > Presumably there are no cache coherency concerns with fuse and dax > writes from the guest side are not a risk of being stranded in CPU > cache. Host side filesystem writeback will take care of them when / if > the guest triggers a storage device cache flush, not a guest page cache > writeback. I'm not so sure. When you call fsync(2) in the guest on virtiofs file, it should provide persistency guarantees on the file contents even in case of *host* power failure. So if the guest is directly mapping host's page cache pages through virtiofs, filemap_fdatawrite() call in the guest must result in fsync(2) on the host to persist those pages. And as far as I vaguely remember that happens by KVM catching the arch_wb_cache_pmem() calls and issuing fsync(2) on the host. But I could be totally wrong here. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR