On Mon, 2019-08-05 at 09:43 +1000, Dave Chinner wrote: > On Fri, Aug 02, 2019 at 05:00:45PM -0500, Goldwyn Rodrigues wrote: > > From: Goldwyn Rodrigues <rgoldwyn@xxxxxxxx> > > > > This helps filesystems to perform tasks on the bio while > > submitting for I/O. Since btrfs requires the position > > we are working on, pass pos to iomap_dio_submit_bio() > > > > The correct place for submit_io() is not page_ops. Would it > > better to rename the structure to something like iomap_io_ops > > or put it directly under struct iomap? > > > > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@xxxxxxxx> > > --- > > fs/iomap/direct-io.c | 16 +++++++++++----- > > include/linux/iomap.h | 1 + > > 2 files changed, 12 insertions(+), 5 deletions(-) > > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > > index 5279029c7a3c..a802e66bf11f 100644 > > --- a/fs/iomap/direct-io.c > > +++ b/fs/iomap/direct-io.c > > @@ -59,7 +59,7 @@ int iomap_dio_iopoll(struct kiocb *kiocb, bool > > spin) > > EXPORT_SYMBOL_GPL(iomap_dio_iopoll); > > > > static void iomap_dio_submit_bio(struct iomap_dio *dio, struct > > iomap *iomap, > > - struct bio *bio) > > + struct bio *bio, loff_t pos) > > { > > atomic_inc(&dio->ref); > > > > @@ -67,7 +67,13 @@ static void iomap_dio_submit_bio(struct > > iomap_dio *dio, struct iomap *iomap, > > bio_set_polled(bio, dio->iocb); > > > > dio->submit.last_queue = bdev_get_queue(iomap->bdev); > > - dio->submit.cookie = submit_bio(bio); > > + if (iomap->page_ops && iomap->page_ops->submit_io) { > > + iomap->page_ops->submit_io(bio, file_inode(dio- > > >iocb->ki_filp), > > + pos); > > + dio->submit.cookie = BLK_QC_T_NONE; > > + } else { > > + dio->submit.cookie = submit_bio(bio); > > + } > > I don't really like this at all. Apart from the fact it doesn't work > with block device polling (RWF_HIPRI), the iomap architecture is That can be added, no? Should be relayed when we clone the bio. > supposed to resolve the file offset -> block device + LBA mapping > completely up front and so all that remains to be done is build and > submit the bio(s) to the block device. > > What I see here is a hack to work around the fact that btrfs has > implemented both file data transformations and device mapping layer > functionality as a filesystem layer between file data bio building > and device bio submission. And as the btrfs file data mapping > (->iomap_begin) is completely unaware that there is further block > mapping to be done before block device bio submission, any generic > code that btrfs uses requires special IO submission hooks rather > than just calling submit_bio(). > > I'm not 100% sure what the solution here is, but the one thing we > must resist is turning the iomap code into a mess of custom hooks > that only one filesystem uses. We've been taught this lesson time > and time again - the iomap infrastructure exists because stuff like > bufferheads and the old direct IO code ended up so full of special > case code that it ossified and became unmodifiable and > unmaintainable. > > We do not want to go down that path again. > > IMO, the iomap IO model needs to be restructured to support post-IO > and pre-IO data verification/calculation/transformation operations > so all the work that needs to be done at the inode/offset context > level can be done in the iomap path before bio submission/after > bio completion. This will allow infrastructure like fscrypt, data > compression, data checksums, etc to be suported generically, not > just by individual filesystems that provide a ->submit_io hook. > > As for the btrfs needing to slice and dice bios for multiple > devices? That should be done via a block device ->make_request > function, not a custom hook in the iomap code. btrfs differentiates the way how metadata and data is handled/replicated/stored. We would still need an entry point in the iomap code to handle the I/O submission. > > That's why I don't like this hook - I think hiding data operations > and/or custom bio manipulations in opaque filesystem callouts is > completely the wrong approach to be taking. We need to do these > things in a generic manner so that all filesystems (and block > devices!) that use the iomap infrastructure can take advantage of > them, not just one of them. > > Quite frankly, I don't care if it takes more time and work up front, > I'm tired of expedient hacks to merge code quickly repeatedly biting > us on the arse and wasting far more time sorting out than we would > have spent getting it right in the first place. Sure. I am open to ideas. What are you proposing? -- Goldwyn