On Wed 22-08-18 10:02:49, Martin Wilck wrote: > On Mon, 2018-07-30 at 20:37 +0800, Ming Lei wrote: > > On Wed, Jul 25, 2018 at 11:15:09PM +0200, Martin Wilck wrote: > > > > > > +/** > > > + * bio_iov_iter_get_pages - pin user or kernel pages and add them > > > to a bio > > > + * @bio: bio to add pages to > > > + * @iter: iov iterator describing the region to be mapped > > > + * > > > + * Pins pages from *iter and appends them to @bio's bvec array. > > > The > > > + * pages will have to be released using put_page() when done. > > > + * The function tries, but does not guarantee, to pin as many > > > pages as > > > + * fit into the bio, or are requested in *iter, whatever is > > > smaller. > > > + * If MM encounters an error pinning the requested pages, it > > > stops. > > > + * Error is returned only if 0 pages could be pinned. > > > + */ > > > +int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) > > > +{ > > > + unsigned short orig_vcnt = bio->bi_vcnt; > > > + > > > + do { > > > + int ret = __bio_iov_iter_get_pages(bio, iter); > > > + > > > + if (unlikely(ret)) > > > + return bio->bi_vcnt > orig_vcnt ? 0 : ret; > > > + > > > + } while (iov_iter_count(iter) && !bio_full(bio)); > > > > When 'ret' isn't zero, and some partial progress has been made, seems > > less pages > > might be obtained than requested too. Is that something we need to > > worry about? > > This would be the case when VM isn't willing or able to fulfill the > page-pinning request. Previously, we came to the conclusion that VM has > the right to do so. This is the reason why callers have to check the > number of pages allocated, and either loop over > bio_iov_iter_get_pages(), or fall back to buffered I/O, until all pages > have been obtained. All callers except the blockdev fast path do the > former. > > We could add looping in __blkdev_direct_IO_simple() on top of the > current patch set, to avoid fallback to buffered IO in this corner > case. Should we? If yes, only for WRITEs, or for READs as well? > > I haven't encountered this situation in my tests, and I'm unsure how to > provoke it - run a direct IO test under high memory pressure? Currently, iov_iter_get_pages() is always guaranteed to get at least one page as that is current guarantee of get_user_pages() (unless we hit EFAULT obviously). So bio_iov_iter_get_pages() as is now is guaranteed to exhaust 'iter' or fill 'bio'. But in the future, the guarantee that get_user_pages() will always pin at least one page may go away. But we'd have to audit all users at that time anyway. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR