Jan Kara wrote: [..] > I agree this is doable but there's the nasty sideeffect that inode reclaim > may block for abitrary time waiting for page pinning. If the application > that has pinned the page requires __GFP_FS memory allocation to get to a > point where it releases the page, we even have a deadlock possibility. > So it's better than the UAF issue but still not ideal. I expect VMA pinning would have similar deadlock exposure if pinning a VMA keeps the inode allocated. Anything that puts a page-pin release dependency in the inode freeing path can potentially deadlock a reclaim event that depends on that inode being freed. As you say the UAF is worse. I am not too worried about the deadlock case for a couple reasons: 1/ There are no reports I can find of iput_final() triggering the WARN that validates that truncate_inode_pages_final() is called while all associated pages are unpinned. That WARN has been in place since 2017: d2c997c0f145 fs, dax: use page->mapping to warn if truncate collides with a busy page 2/ It is bad form for I/O drivers to perform __GFP_FS and __GFP_IO allocations in their fast paths. So while the deadlock is not impossible it is unlikely with the major producers of transient page pin events. My hope, famous last words, is that this is only a theoretical deadlock, or we can handle this with targeted driver fixes. Any driver that thinks it wants to pin pages and then do more allocations that recurse into the FS likely wants to get that out of its fast path anyway. I will also take a look at a lockdep annotation for the wait event to see if that can give an early warning versus fs_reclaim_acquire().