On Fri, Jan 21, 2022 at 08:51:57AM +0100, Peter Zijlstra wrote: > On Thu, Jan 20, 2022 at 07:25:08PM +0100, David Hildenbrand wrote: > > On 20.01.22 16:55, Peter Zijlstra wrote: > > > Add a guarantee for Anon pages that pin_user_page*() ensures the > > > user-mapping of these pages stay preserved. In order to ensure this > > > all rmap users have been audited: > > > > > > vmscan: already fails eviction due to page_maybe_dma_pinned() > > > > > > migrate: migration will fail on pinned pages due to > > > expected_page_refs() not matching, however that is > > > *after* try_to_migrate() has already destroyed the > > > user mapping of these pages. Add an early exit for > > > this case. > > > > > > numa-balance: as per the above, pinned pages cannot be migrated, > > > however numa balancing scanning will happily PROT_NONE > > > them to get usage information on these pages. Avoid > > > this for pinned pages. > > > > page_maybe_dma_pinned() can race with GUP-fast without > > mm->write_protect_seq. This is a real problem for vmscan() with > > concurrent GUP-fast as it can result in R/O mappings of pinned pages and > > GUP will lose synchronicity to the page table on write faults due to > > wrong COW. > > Urgh, so yeah, that might be a problem. Follow up code uses it like > this: > > +/* > + * Pinning a page inhibits rmap based unmap for Anon pages. Doing a load > + * through the user mapping ensures the user mapping exists. > + */ > +#define umcg_pin_and_load(_self, _pagep, _member) \ > +({ \ > + __label__ __out; \ > + int __ret = -EFAULT; \ > + \ > + if (pin_user_pages_fast((unsigned long)(_self), 1, 0, &(_pagep)) != 1) \ > + goto __out; \ > + \ > + if (!PageAnon(_pagep) || \ > + get_user(_member, &(_self)->_member)) { \ > + unpin_user_page(_pagep); \ > + goto __out; \ > + } \ > + __ret = 0; \ > +__out: __ret; \ > +}) > > And after that hard assumes (on the penalty of SIGKILL) that direct user > access works. Specifically it does RmW ops on it. So I suppose I'd > better upgrade that load to a RmW at the very least. > > But is that sufficient? Let me go find that race you mention... OK, so copy_page_range() vs lockless_pages_from_mm(). Since I use FOLL_PIN that should be sorted, it'll fall back the slow path and use mmap_sem and serialize against the fork(). (Also, can I express my hate for __gup_longterm_unlocked(), that function name is utter garbage) However, I'm not quite sure what fork() does with pages that have a pin. There's been a number of GUP vs fork() problems over the years, but I'm afraid I have lost track of that and I can't quickly find anything in the code.. Naively, a page that has async DMA activity should not be CoW'ed, or if it is, care must be taken to ensure the original pages stays in the original process, but I realize that's somewhat hard. Let me dig in a bit more.