Hi, On Mon 30-10-17 13:00:23, Dave Chinner wrote: > On Sun, Oct 29, 2017 at 04:46:44PM -0700, Dan Williams wrote: > > Coming back to this since Dave has made clear that new locking to > > coordinate get_user_pages() is a no-go. > > > > We can unmap to force new get_user_pages() attempts to block on the > > per-fs mmap lock, but if punch-hole finds any elevated pages it needs > > to drop the mmap lock and wait. We need this lock dropped to get > > around the problem that the driver will not start to drop page > > references until it has elevated the page references on all the pages > > in the I/O. If we need to drop the mmap lock that makes it impossible > > to coordinate this unlock/retry loop within truncate_inode_pages_range > > which would otherwise be the natural place to land this code. > > > > Would it be palatable to unmap and drain dma in any path that needs to > > detach blocks from an inode? Something like the following that builds > > on dax_wait_dma() tried to achieve, but does not introduce a new lock > > for the fs to manage: > > > > retry: > > per_fs_mmap_lock(inode); > > unmap_mapping_range(mapping, start, end); /* new page references > > cannot be established */ > > if ((dax_page = dax_dma_busy_page(mapping, start, end)) != NULL) { > > per_fs_mmap_unlock(inode); /* new page references can happen, > > so we need to start over */ > > wait_for_page_idle(dax_page); > > goto retry; > > } > > truncate_inode_pages_range(mapping, start, end); > > per_fs_mmap_unlock(inode); > > These retry loops you keep proposing are just bloody horrible. They > are basically just a method for blocking an operation until whatever > condition is preventing the invalidation goes away. IMO, that's an > ugly solution no matter how much lipstick you dress it up with. > > i.e. the blocking loops mean the user process is going to be blocked > for arbitrary lengths of time. That's not a solution, it's just > passing the buck - now the userspace developers need to work around > truncate/hole punch being randomly blocked for arbitrary lengths of > time. So I see substantial difference between how you and Christoph think this should be handled. Christoph writes in [1]: The point is that we need to prohibit long term elevated page counts with DAX anyway - we can't just let people grab allocated blocks forever while ignoring file system operations. For stage 1 we'll just need to fail those, and in the long run they will have to use a mechanism similar to FL_LAYOUT locks to deal with file system allocation changes. So Christoph wants to block truncate until references are released, forbid long term references until userspace acquiring them supports some kind of lease-breaking. OTOH you suggest truncate should just proceed leaving blocks allocated until references are released. We cannot have both... I'm leaning more towards the approach Christoph suggests as it puts the burned to the place which is causing it - the application having long term references - and applications needing this should be sufficiently rare that we don't have to devise a general mechanism in the kernel for this. If the solution Christoph suggests is acceptable to you, I think we should first write a patch to forbid acquiring long term references to DAX blocks. On top of that we can implement mechanism to block truncate while there are short term references pending (and for that retry loops would be IMHO acceptable). And then we can work on a mechanism to notify userspace that it needs to drop references to blocks that are going to be truncated so that we can re-enable taking of long term references. Honza [1] https://www.mail-archive.com/linux-kernel@xxxxxxxxxxxxxxx/msg1522887.html -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR