Re: [PATCH v2 05/18] xfs: Add xfs_break_layouts() to the inode eviction path

Dan Williams <dan.j.williams@xxxxxxxxx> · Fri, 30 Sep 2022 10:56:27 -0700

Jan Kara wrote:
[..]
> I agree this is doable but there's the nasty sideeffect that inode reclaim
> may block for abitrary time waiting for page pinning. If the application
> that has pinned the page requires __GFP_FS memory allocation to get to a
> point where it releases the page, we even have a deadlock possibility.
> So it's better than the UAF issue but still not ideal.

I expect VMA pinning would have similar deadlock exposure if pinning a
VMA keeps the inode allocated. Anything that puts a page-pin release
dependency in the inode freeing path can potentially deadlock a reclaim
event that depends on that inode being freed.

As you say the UAF is worse. I am not too worried about the deadlock
case for a couple reasons:

1/ There are no reports I can find of iput_final() triggering the WARN
that validates that truncate_inode_pages_final() is called while all
associated pages are unpinned. That WARN has been in place since 2017:

d2c997c0f145 fs, dax: use page->mapping to warn if truncate collides with a busy page

2/ It is bad form for I/O drivers to perform __GFP_FS and __GFP_IO
allocations in their fast paths. So while the deadlock is not impossible
it is unlikely with the major producers of transient page pin events.

My hope, famous last words, is that this is only a theoretical deadlock,
or we can handle this with targeted driver fixes. Any driver that thinks
it wants to pin pages and then do more allocations that recurse into the
FS likely wants to get that out of its fast path anyway. I will also
take a look at a lockdep annotation for the wait event to see if that
can give an early warning versus fs_reclaim_acquire().