Re: [PATCH v2 bpf-next 10/10] selftests/bpf: add build ID tests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 30, 2024 at 01:03:17PM -0700, Andrii Nakryiko wrote:
> On Sun, Jul 28, 2024 at 12:38 PM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
> >
> > On Fri, Jul 26, 2024 at 05:37:55PM -0700, Andrii Nakryiko wrote:
> > > On Fri, Jul 26, 2024 at 5:27 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
> > > >
> > > > On Thu, Jul 25, 2024 at 01:03:55PM -0700, Andrii Nakryiko wrote:
> > > > > On Thu, Jul 25, 2024 at 5:12 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
> > > > > >
> > > > > > On Wed, Jul 24, 2024 at 03:52:10PM -0700, Andrii Nakryiko wrote:
> > > > > > > Add a new set of tests validating behavior of capturing stack traces
> > > > > > > with build ID. We extend uprobe_multi target binary with ability to
> > > > > > > trigger uprobe (so that we can capture stack traces from it), but also
> > > > > > > we allow to force build ID data to be either resident or non-resident in
> > > > > > > memory (see also a comment about quirks of MADV_PAGEOUT).
> > > > > > >
> > > > > > > That way we can validate that in non-sleepable context we won't get
> > > > > > > build ID (as expected), but with sleepable uprobes we will get that
> > > > > > > build ID regardless of it being physically present in memory.
> > > > > > >
> > > > > > > Also, we add a small add-on linker script which reorders
> > > > > > > .note.gnu.build-id section and puts it after (big) .text section,
> > > > > > > putting build ID data outside of the very first page of ELF file. This
> > > > > > > will test all the relaxations we did in build ID parsing logic in kernel
> > > > > > > thanks to freader abstraction.
> > > > > > >
> > > > > > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> > > > > >
> > > > > > one of my bpf selftests runs showed:
> > > > > >
> > > > > > test_build_id:PASS:parse_build_id 0 nsec
> > > > > > subtest_nofault:PASS:skel_open 0 nsec
> > > > > > subtest_nofault:PASS:link 0 nsec
> > > > > > subtest_nofault:PASS:trigger_uprobe 0 nsec
> > > > > > subtest_nofault:PASS:res 0 nsec
> > > > > > subtest_nofault:FAIL:build_id_status unexpected build_id_status: actual 1 != expected 2
> > > > > > #42/1    build_id/nofault-paged-out:FAIL
> > > > > > #42/2    build_id/nofault-paged-in:OK
> > > > > > #42/3    build_id/sleepable:OK
> > > > > > #42      build_id:FAIL
> > > > > >
> > > > > > I could never reproduce again.. but I wonder the the page could sneak
> > > > > > in before the bpf program is hit and the buildid will get parsed?
> > > > > >
> > > > >
> > > > > Yes, and I just realized that I forgot to mark this test as serial. If
> > > > > there is parallel test that also runs uprobe_multi and that causes
> > > > > build_id page to be paged in into page cache, then this might succeed.
> > > > > So I need to mark the test itself serial.
> > > > >
> > > > > Another issue which I was debugging (and fixed) yesterday was that if
> > > > > the memory passed for MADV_PAGEOUT is not yet memory mapped into the
> > > > > current process, then it won't be really removed from the page cache.
> > > > > I avoid that by first paging it in, and then MADV_PAGEOUT.
> > > >
> > > > ok, I triggered that in serial run, so I probably hit this one
> > > >
> > >
> > > you did it with v2 of the patch set? I had this bug in v1, but v2
> > > should be fine, as far as I understand (due to unconditional
> > > madvise(addr, page_sz, MADV_POPULATE_READ); before madvise(addr,
> > > page_sz, MADV_PAGEOUT)). At least I haven't been able to reproduce
> > > that anymore and BPF CI is now happy as well.
> >
> > yes, it's with v2 and I can still see that.. but only for the first run of
> > the test after reboot.. so far I have no clue.. I can see the successful
> > page-out madvise (still not sure how much is that telling about the page
> > being paged out), and then the build id code sees the page just fine
> >
> > attaching my .config in case
> >
> 
> I wasn't able to repro this, sorry. It works very reliably for me with
> your or my config. Given it also seems to work reliably in BPF CI, I'm
> still inclined to add this tests, I think it's good to have that
> coverage.
> 
> I'll monitor, and if it becomes flaky, we'll need to reassess this, of course.

np, I'll try to spend some more time on it

jirka




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux