On Thu, 2017-02-02 at 11:16 +0000, Al Viro wrote: > On Thu, Feb 02, 2017 at 02:56:51AM -0800, Christoph Hellwig wrote: > > > I really wonder if this is the right approach. Most of the users of > > > iov_iter_get_pages()/iov_iter_get_pages_alloc() look like they want > > > something like > > > iov_iter_for_each_page(iter, size, f, data) > > > with int (*f)(struct page *page, size_t from, size_t size, void *data) > > > passed as callback. Not everything fits that model, but there's a whole > > > lot of things that do. > > Yeah, that does seem nicer. My only concern is that it sounds more invasive, and I'd like to get some sort of interim fix in for Ceph in the meantime (preferably something we can send to stable). It really is trivial to vmsplice some data into the kernel, and then splice write that to a DIO file -- instant softlockup in my testing. This patch also makes vectored DIO with small iovecs on NFS a lot more efficient, which is a nice bonus. > > I was planning to do that, mostly because of the iomap dio code that > > would not only get a lot cleaner with this, but also support multi-page > > bvecs that we hope to have in the block layer soon. The issue with it > > is that we need to touch all the arch get_user_pages_fast > > implementations, so it's going to be a relatively invasive change that I > > didn't want to fix with just introducing the new direct I/O code. > > I'm not sure we need to touch any get_user_pages_fast() at all; let it > fill a medium-sized array and use that as a buffer. In particular, > I *really* don't like the idea of having the callbacks done in an > inconsistent locking environment - sometimes under ->mmap_sem, sometimes > not. > Yeah, that might work. You could kmalloc the buffer array according to the maxsize value. For small ones we could even consider using an on- stack buffer. > I played with "let it fill bio_vec array", but it doesn't really fit the > users; variant with callbacks is cleaner, IMO. I looked at converting this over to a bio_vec array as well, but that really doesn't work well for Ceph. I like the callback idea better too. -- Jeff Layton <jlayton@xxxxxxxxxx>