On Mon, Oct 30, 2017 at 09:38:07AM +0100, Jan Kara wrote: > Hi, > > On Mon 30-10-17 13:00:23, Dave Chinner wrote: > > On Sun, Oct 29, 2017 at 04:46:44PM -0700, Dan Williams wrote: > > > Coming back to this since Dave has made clear that new locking to > > > coordinate get_user_pages() is a no-go. > > > > > > We can unmap to force new get_user_pages() attempts to block on the > > > per-fs mmap lock, but if punch-hole finds any elevated pages it needs > > > to drop the mmap lock and wait. We need this lock dropped to get > > > around the problem that the driver will not start to drop page > > > references until it has elevated the page references on all the pages > > > in the I/O. If we need to drop the mmap lock that makes it impossible > > > to coordinate this unlock/retry loop within truncate_inode_pages_range > > > which would otherwise be the natural place to land this code. > > > > > > Would it be palatable to unmap and drain dma in any path that needs to > > > detach blocks from an inode? Something like the following that builds > > > on dax_wait_dma() tried to achieve, but does not introduce a new lock > > > for the fs to manage: > > > > > > retry: > > > per_fs_mmap_lock(inode); > > > unmap_mapping_range(mapping, start, end); /* new page references > > > cannot be established */ > > > if ((dax_page = dax_dma_busy_page(mapping, start, end)) != NULL) { > > > per_fs_mmap_unlock(inode); /* new page references can happen, > > > so we need to start over */ > > > wait_for_page_idle(dax_page); > > > goto retry; > > > } > > > truncate_inode_pages_range(mapping, start, end); > > > per_fs_mmap_unlock(inode); > > > > These retry loops you keep proposing are just bloody horrible. They > > are basically just a method for blocking an operation until whatever > > condition is preventing the invalidation goes away. IMO, that's an > > ugly solution no matter how much lipstick you dress it up with. > > > > i.e. the blocking loops mean the user process is going to be blocked > > for arbitrary lengths of time. That's not a solution, it's just > > passing the buck - now the userspace developers need to work around > > truncate/hole punch being randomly blocked for arbitrary lengths of > > time. > > So I see substantial difference between how you and Christoph think this > should be handled. Christoph writes in [1]: > > The point is that we need to prohibit long term elevated page counts > with DAX anyway - we can't just let people grab allocated blocks forever > while ignoring file system operations. For stage 1 we'll just need to > fail those, and in the long run they will have to use a mechanism > similar to FL_LAYOUT locks to deal with file system allocation changes. > > So Christoph wants to block truncate until references are released, forbid > long term references until userspace acquiring them supports some kind of > lease-breaking. OTOH you suggest truncate should just proceed leaving > blocks allocated until references are released. I don't see what I'm suggesting is a solution to long term elevated page counts. Just something that can park extents until layout leases are broken and references released. That's a few tens of seconds at most. > We cannot have both... I'm leaning more towards the approach > Christoph suggests as it puts the burned to the place which is > causing it - the application having long term references - and > applications needing this should be sufficiently rare that we > don't have to devise a general mechanism in the kernel for this. I have no problems with blocking truncate forever if that's the desired solution for an elevated page count due to a DMA reference to a page. But that has absolutely nothing to do with the filesystem though - it's a page reference vs mapping invalidation problem, not a filesystem/inode problem. Perhaps pages with active DAX DMA mapping references need a page flag to indicate that invalidation must block on the page similar to the writeback flag... > If the solution Christoph suggests is acceptable to you, I think > we should first write a patch to forbid acquiring long term > references to DAX blocks. On top of that we can implement > mechanism to block truncate while there are short term references > pending (and for that retry loops would be IMHO acceptable). The problem with retry loops is that they are making a mess of an already complex set of locking contraints on the indoe IO path. It's rapidly descending into an unmaintainable mess - falling off the locking cliff only make sthe code harder to maintain - please look for solutions that don't require new locks or lock retry loops. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html