On Fri, Aug 02, 2019 at 05:00:45PM -0500, Goldwyn Rodrigues wrote: > From: Goldwyn Rodrigues <rgoldwyn@xxxxxxxx> > > This helps filesystems to perform tasks on the bio while > submitting for I/O. Since btrfs requires the position > we are working on, pass pos to iomap_dio_submit_bio() > > The correct place for submit_io() is not page_ops. Would it > better to rename the structure to something like iomap_io_ops > or put it directly under struct iomap? > > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@xxxxxxxx> > --- > fs/iomap/direct-io.c | 16 +++++++++++----- > include/linux/iomap.h | 1 + > 2 files changed, 12 insertions(+), 5 deletions(-) > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > index 5279029c7a3c..a802e66bf11f 100644 > --- a/fs/iomap/direct-io.c > +++ b/fs/iomap/direct-io.c > @@ -59,7 +59,7 @@ int iomap_dio_iopoll(struct kiocb *kiocb, bool spin) > EXPORT_SYMBOL_GPL(iomap_dio_iopoll); > > static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap, > - struct bio *bio) > + struct bio *bio, loff_t pos) > { > atomic_inc(&dio->ref); > > @@ -67,7 +67,13 @@ static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap, > bio_set_polled(bio, dio->iocb); > > dio->submit.last_queue = bdev_get_queue(iomap->bdev); > - dio->submit.cookie = submit_bio(bio); > + if (iomap->page_ops && iomap->page_ops->submit_io) { > + iomap->page_ops->submit_io(bio, file_inode(dio->iocb->ki_filp), > + pos); > + dio->submit.cookie = BLK_QC_T_NONE; > + } else { > + dio->submit.cookie = submit_bio(bio); > + } I don't really like this at all. Apart from the fact it doesn't work with block device polling (RWF_HIPRI), the iomap architecture is supposed to resolve the file offset -> block device + LBA mapping completely up front and so all that remains to be done is build and submit the bio(s) to the block device. What I see here is a hack to work around the fact that btrfs has implemented both file data transformations and device mapping layer functionality as a filesystem layer between file data bio building and device bio submission. And as the btrfs file data mapping (->iomap_begin) is completely unaware that there is further block mapping to be done before block device bio submission, any generic code that btrfs uses requires special IO submission hooks rather than just calling submit_bio(). I'm not 100% sure what the solution here is, but the one thing we must resist is turning the iomap code into a mess of custom hooks that only one filesystem uses. We've been taught this lesson time and time again - the iomap infrastructure exists because stuff like bufferheads and the old direct IO code ended up so full of special case code that it ossified and became unmodifiable and unmaintainable. We do not want to go down that path again. IMO, the iomap IO model needs to be restructured to support post-IO and pre-IO data verification/calculation/transformation operations so all the work that needs to be done at the inode/offset context level can be done in the iomap path before bio submission/after bio completion. This will allow infrastructure like fscrypt, data compression, data checksums, etc to be suported generically, not just by individual filesystems that provide a ->submit_io hook. As for the btrfs needing to slice and dice bios for multiple devices? That should be done via a block device ->make_request function, not a custom hook in the iomap code. That's why I don't like this hook - I think hiding data operations and/or custom bio manipulations in opaque filesystem callouts is completely the wrong approach to be taking. We need to do these things in a generic manner so that all filesystems (and block devices!) that use the iomap infrastructure can take advantage of them, not just one of them. Quite frankly, I don't care if it takes more time and work up front, I'm tired of expedient hacks to merge code quickly repeatedly biting us on the arse and wasting far more time sorting out than we would have spent getting it right in the first place. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx