On Thu, Aug 12, 2021 at 10:48:18AM -0700, Darrick J. Wong wrote: > On Thu, Aug 12, 2021 at 07:02:33PM +0200, Christoph Hellwig wrote: > > On Thu, Aug 12, 2021 at 04:39:40PM +0100, Matthew Wilcox wrote: > > > I agree with David; we want something lower-level for swap to call into. > > > I'd suggest aops->swap_rw and an implementation might well look > > > something like: > > > > > > static ssize_t ext4_swap_rw(struct kiocb *iocb, struct iov_iter *iter) > > > { > > > return iomap_dio_rw(iocb, iter, &ext4_iomap_ops, NULL, 0); > > > } > > > > Yes, that might make sense and would also replace the awkward IOCB_SWAP > > flag for the write side. > > > > For file systems like ext4 and xfs that have an in-memory block mapping > > tree this would be way better than the current version and also support > > swap on say multi-device file systems properly. We'd just need to be > > careful to read the extent information in at extent_activate time, > > by doing xfs_iread_extents for XFS or the equivalents in other file > > systems. > > You'd still want to walk the extent map at activation time to reject > swapfiles with holes, shared extents, etc., right? Yes. While direct I/O code could do allocation at swap I/O time that probably is not a good idea due to the memory requirements.