On Wed, Sep 07, 2022 at 01:45:26AM -0700, Christoph Hellwig wrote: > On Tue, Sep 06, 2022 at 12:21:06PM +0200, Jan Kara wrote: > > > For FOLL_PIN callers, never pin bvec and kvec pages: For file systems > > > not acquiring a reference is obviously safe, and the other callers will > > > need an audit, but I can't think of why it woul ever be unsafe. > > > > Are you sure about "For file systems not acquiring a reference is obviously > > safe"? I can see places e.g. in orangefs, afs, etc. which create bvec iters > > from pagecache pages. And then we have iter_file_splice_write() which > > creates bvec from pipe pages (which can also be pagecache pages if > > vmsplice() is used). So perhaps there are no lifetime issues even without > > acquiring a reference (but looking at the code I would not say it is > > obvious) but I definitely don't see how it would be safe to not get a pin > > to signal to filesystem backing the pagecache page that there is DMA > > happening to/from the page. > > I mean in the context of iov_iter_get_pages callers, that is direct > I/O. Direct callers of iov_iter_bvec which then pass that iov to > ->read_iter / ->write_iter will need to hold references (those are > the references that the callers of iov_iter_get_pages rely on!). Unless I'm misreading Jan, the question is whether they should get or pin. AFAICS, anyone who passes the sucker to ->read_iter() (or ->recvmsg(), or does direct copy_to_iter()/zero_iter(), etc.) is falling under ================================================================================= CASE 5: Pinning in order to write to the data within the page ------------------------------------------------------------- Even though neither DMA nor Direct IO is involved, just a simple case of "pin, write to a page's data, unpin" can cause a problem. Case 5 may be considered a superset of Case 1, plus Case 2, plus anything that invokes that pattern. In other words, if the code is neither Case 1 nor Case 2, it may still require FOLL_PIN, for patterns like this: Correct (uses FOLL_PIN calls): pin_user_pages() write to the data within the pages unpin_user_pages() INCORRECT (uses FOLL_GET calls): get_user_pages() write to the data within the pages put_page() ================================================================================= Regarding iter_file_splice_write() case, do we need to pin pages when we are not going to modify the data in those? The same goes for afs, AFAICS; I started to type "... and everything that passes WRITE to iov_iter_bvec()", but... drivers/vhost/vringh.c:1165: iov_iter_bvec(&iter, READ, iov, ret, translated); drivers/vhost/vringh.c:1198: iov_iter_bvec(&iter, WRITE, iov, ret, translated); is backwards - READ is for data destinations, comes with copy_to_iter(); WRITE is for data sources and it comes with copy_from_iter()... I'm really tempted to slap if (WARN_ON(i->data_source)) return 0; into copy_to_iter() et.al., along with its opposite for copy_from_iter(). And see who comes screaming... Things like if (unlikely(iov_iter_is_pipe(i) || iov_iter_is_discard(i))) { WARN_ON(1); return 0; } in e.g. csum_and_copy_from_iter() would be replaced by that, and become easier to understand... These two are also getting it wrong, BTW: drivers/target/target_core_file.c:340: iov_iter_bvec(&iter, READ, bvec, sgl_nents, len); drivers/target/target_core_file.c:476: iov_iter_bvec(&iter, READ, bvec, nolb, len);