On 9/15/20 4:22 PM, Jason Gunthorpe wrote:
On Tue, Sep 15, 2020 at 05:33:30PM -0400, Peter Xu wrote:
RDMA doesn't ever use !WRITE
I'm guessing #5 is the issue, just with a different ordering. If the
#3 pin_user_pages() preceeds the #2 fork, don't we get to the same #5?
Right, but only if without MADV_DONTFORK?
Yes, results are that MADV_DONTFORK resolves the issue for initial
tests. I should know if a bigger test suite passes in a few days.
If this is a problem, we may still need the fix patch (maybe not as urgent as
before at least). But I'd like to double confirm, just in case I miss some
obvious facts above.
I'm worred that the sudden need to have MAD_DONTFORK is going to be a
turn into a huge regression. It already blew up our first level of
synthetic test cases. I'm worried what we will see when the
application suite is run in a few months :\
For my own preference I'll consider changing kernel behavior if the impact is
still under control (the performance report of 30%+ boost is also attractive
after the simplify-cow patch). The other way is to maintain the old reuse
logic forever, that'll be another kind of burden. Seems no easy way on either
side...
It seems very strange that a physical page exclusively owned by a
process can become copied if pin_user_pages() is active and the
process did fork() at some point.
Could the new pin_user_pages() logic help here? eg the
GUP_PIN_COUNTING_BIAS stuff?
Could the COW code consider a refcount of GUP_PIN_COUNTING_BIAS + 1 as
being owned by the current mm and not needing COW? The DMA pin would
be 'invisible' for COW purposes?
Please do be careful to use the API, rather than the implementation. The
FOLL_PIN refcounting system results in being able to get a "maybe
DMA-pinned", or a "definitely not DMA-pinned", via this API call:
static inline bool page_maybe_dma_pinned(struct page *page)
...which does *not* always use GUP_PIN_COUNTING_BIAS to provide that
answer.
thanks,
--
John Hubbard
NVIDIA