On Mon, Sep 26, 2022 at 08:53:43AM -0700, Christoph Hellwig wrote: > On Fri, Sep 23, 2022 at 05:13:42PM +0100, Al Viro wrote: > > You are mixing two issues here - holding references to pages while using > > iov_iter instance is obvious; holding them until async IO is complete, even > > though struct iov_iter might be long gone by that point is a different > > story. > > But someone needs to hold a refernce until the I/O is completed, because > the I/O obviously needs the pages. Yes, we could say the callers holds > them and can drop the references right after I/O submission, while > the method needs to grab another reference. But that is more > complicated and is more costly than just holding the damn reference. Take a look at __nfs_create_request(). And trace the call chains leading to nfs_clear_request() where the corresponding put_page() happens. What I'm afraid of is something similar in the bowels of some RDMA driver. With upper layers shoving page references into sglist using iov_iter_get_pages(), then passing sglist to some intermediate layer, then *that* getting passed down into a driver which grabs references for its own use and releases them from destructor of some private structure. Done via kref_put(). Have that delayed by, hell - anything, up to and including debugfs shite somewhere in the same driver, iterating through those private structures, grabbing a reference to do some pretty-print into kmalloc'ed buffer, then drooping it. Voila - we have page refs duplicated from ITER_BVEC and occasionally staying around after the ->ki_complete() of async ->write_iter() that got that ITER_BVEC. It's really not a trivial rule change.