On Wed, Aug 22, 2018 at 12:33:05PM +0200, Jan Kara wrote: > On Wed 22-08-18 10:02:49, Martin Wilck wrote: > > On Mon, 2018-07-30 at 20:37 +0800, Ming Lei wrote: > > > On Wed, Jul 25, 2018 at 11:15:09PM +0200, Martin Wilck wrote: > > > > > > > > +/** > > > > + * bio_iov_iter_get_pages - pin user or kernel pages and add them > > > > to a bio > > > > + * @bio: bio to add pages to > > > > + * @iter: iov iterator describing the region to be mapped > > > > + * > > > > + * Pins pages from *iter and appends them to @bio's bvec array. > > > > The > > > > + * pages will have to be released using put_page() when done. > > > > + * The function tries, but does not guarantee, to pin as many > > > > pages as > > > > + * fit into the bio, or are requested in *iter, whatever is > > > > smaller. > > > > + * If MM encounters an error pinning the requested pages, it > > > > stops. > > > > + * Error is returned only if 0 pages could be pinned. > > > > + */ > > > > +int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) > > > > +{ > > > > + unsigned short orig_vcnt = bio->bi_vcnt; > > > > + > > > > + do { > > > > + int ret = __bio_iov_iter_get_pages(bio, iter); > > > > + > > > > + if (unlikely(ret)) > > > > + return bio->bi_vcnt > orig_vcnt ? 0 : ret; > > > > + > > > > + } while (iov_iter_count(iter) && !bio_full(bio)); > > > > > > When 'ret' isn't zero, and some partial progress has been made, seems > > > less pages > > > might be obtained than requested too. Is that something we need to > > > worry about? > > > > This would be the case when VM isn't willing or able to fulfill the > > page-pinning request. Previously, we came to the conclusion that VM has > > the right to do so. This is the reason why callers have to check the > > number of pages allocated, and either loop over > > bio_iov_iter_get_pages(), or fall back to buffered I/O, until all pages > > have been obtained. All callers except the blockdev fast path do the > > former. > > > > We could add looping in __blkdev_direct_IO_simple() on top of the > > current patch set, to avoid fallback to buffered IO in this corner > > case. Should we? If yes, only for WRITEs, or for READs as well? > > > > I haven't encountered this situation in my tests, and I'm unsure how to > > provoke it - run a direct IO test under high memory pressure? > > Currently, iov_iter_get_pages() is always guaranteed to get at least one > page as that is current guarantee of get_user_pages() (unless we hit > EFAULT obviously). So bio_iov_iter_get_pages() as is now is guaranteed to Is it possible for this EFAULT to happen on the user-space VM? Thanks, Ming