Le 23/04/2019 à 11:27, Peter Zijlstra a écrit :
On Tue, Apr 16, 2019 at 03:45:12PM +0200, Laurent Dufour wrote:
This allows to search for a VMA structure without holding the mmap_sem.
The search is repeated while the mm seqlock is changing and until we found
a valid VMA.
While under the RCU protection, a reference is taken on the VMA, so the
caller must call put_vma() once it not more need the VMA structure.
At the time a VMA is inserted in the MM RB tree, in vma_rb_insert(), a
reference is taken to the VMA by calling get_vma().
When removing a VMA from the MM RB tree, the VMA is not release immediately
but at the end of the RCU grace period through vm_rcu_put(). This ensures
that the VMA remains allocated until the end the RCU grace period.
Since the vm_file pointer, if valid, is released in put_vma(), there is no
guarantee that the file pointer will be valid on the returned VMA.
What I'm missing here, and in the previous patch introducing the
refcount (also see refcount_t), is _why_ we need the refcount thing at
all.
The need for the VMA's refcount is to ensure that the VMA will remain
until the end of the SPF handler. This is a consequence of the use of
RCU instead of SRCU to protect the RB tree.
I was not aware of the refcount_t type, it would be better here to avoid
wrapping.
My original plan was to use SRCU, which at the time was not complete
enough so I abused/hacked preemptible RCU, but that is no longer the
case, SRCU has all the required bits and pieces.
When I did test using SRCU it was involving a lot a scheduling to run
the SRCU callback mechanism. In some workload the impact on the
perfomance was significant [1].
I can't see this overhead using RCU.
Also; the initial motivation was prefaulting large VMAs and the
contention on mmap was killing things; but similarly, the contention on
the refcount (I did try that) killed things just the same.
Doing prefaulting should be doable, I'll try to think further about that.
Regarding the refcount, I should I missed something, this is an atomic
counter, so there should not be contention on it but cache exclusivity,
not ideal I agree but I can't see what else to use here.
So I'm really sad to see the refcount return; and without any apparent
justification.
I'm not opposed to use another mechanism here, but SRCU didn't show good
performance with some workload, and I can't see how to use RCU without a
reference counter here. So please, advise.
Thanks,
Laurent.
[1]
https://lore.kernel.org/linux-mm/7ca80231-fe02-a3a7-84bc-ce81690ea051@xxxxxxxxx/