Re: [PATCH mm/bpf v2] mm: bpf: add find_vma_no_check() without lockdep_assert on mm->mmap_lock

Yonghong Song <yhs@xxxxxx> · Wed, 8 Sep 2021 09:09:57 -0700

On 9/8/21 8:12 AM, Liam Howlett wrote:
* Luigi Rizzo <lrizzo@xxxxxxxxxx> [210908 10:44]:
On Wed, Sep 8, 2021 at 4:16 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

On Wed, Sep 08, 2021 at 10:53:26AM -0300, Jason Gunthorpe wrote:
On Wed, Sep 08, 2021 at 02:20:17PM +0200, Daniel Borkmann wrote:

The warning is due to commit 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()")
which added mmap_assert_locked() in find_vma() function. The mmap_assert_locked() function
asserts that mm->mmap_lock needs to be held. But this is not the case for
bpf_get_stack() or bpf_get_stackid() helper (kernel/bpf/stackmap.c), which
uses mmap_read_trylock_non_owner() instead. Since mm->mmap_lock is not held
in bpf_get_stack[id]() use case, the above warning is emitted during test run.
...
Luigi / Liam / Andrew, if the below looks reasonable to you, any objections to route the
fix with your ACKs via bpf tree to Linus (or strong preference via -mm fixes)?

Michel added this remark along with the mmap_read_trylock_non_owner:

     It's still not ideal that bpf/stackmap subverts the lock ownership in this
     way.  Thanks to Peter Zijlstra for suggesting this API as the least-ugly
     way of addressing this in the short term.

Subverting lockdep and then adding more and more core MM APIs to
support this seems quite a bit more ugly than originally expected.

Michel's original idea to split out the lockdep abuse and put it only
in BPF is probably better. Obtain the mmap_read_trylock normally as
owner and then release ownership only before triggering the work. At
least lockdep will continue to work properly for the find_vma.

The only right solution to all of this is the below. That function
downright subverts all the locking rules we have. Spreading the hacks
any further than that one function is absolutely unacceptable.

I'd be inclined to agree that we should not introduce hacks around
locking rules. I don't know enough about the constraints of
bpf/stackmap, how much of a performance penalty do we pay with Peter's
patch,
and ow often one is expected to call this function ?

cheers
luigi

The hack already exists.  The symptom of the larger issue is that
lockdep now catches the hack when using find_vma().

If Peter's solution is acceptable to the bpf folks, then we can go ahead
and drop the option of using the non_owner variant - which would be
best.  Otherwise the hack around the locking rule still exists as long
as the find_vma() interface isn't used.

Hi, Peter, Luigi, Liam, Jason,

Peter's solution will definitely break user applications using
BPF_F_USER_BUILD_ID feature 
(https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h#L1193). 
So in my opinion,
the "hack" is still needed to avoid break user space application.

To answer Luigi's question below:
> I don't know enough about the constraints of
> bpf/stackmap, how much of a performance penalty do we pay with Peter's
> patch,
> and ow often one is expected to call this function ?

This function is called only if user bpf program specifies 
BPF_F_USER_BUILD_ID in bpf_get_stack() or bpf_get_stack_pe() helper.

Thanks,
Liam