On Tue, Jul 30, 2024 at 01:03:17PM -0700, Andrii Nakryiko wrote: > On Sun, Jul 28, 2024 at 12:38 PM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > > > On Fri, Jul 26, 2024 at 05:37:55PM -0700, Andrii Nakryiko wrote: > > > On Fri, Jul 26, 2024 at 5:27 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > > > > > > > On Thu, Jul 25, 2024 at 01:03:55PM -0700, Andrii Nakryiko wrote: > > > > > On Thu, Jul 25, 2024 at 5:12 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > > > > > > > > > > > On Wed, Jul 24, 2024 at 03:52:10PM -0700, Andrii Nakryiko wrote: > > > > > > > Add a new set of tests validating behavior of capturing stack traces > > > > > > > with build ID. We extend uprobe_multi target binary with ability to > > > > > > > trigger uprobe (so that we can capture stack traces from it), but also > > > > > > > we allow to force build ID data to be either resident or non-resident in > > > > > > > memory (see also a comment about quirks of MADV_PAGEOUT). > > > > > > > > > > > > > > That way we can validate that in non-sleepable context we won't get > > > > > > > build ID (as expected), but with sleepable uprobes we will get that > > > > > > > build ID regardless of it being physically present in memory. > > > > > > > > > > > > > > Also, we add a small add-on linker script which reorders > > > > > > > .note.gnu.build-id section and puts it after (big) .text section, > > > > > > > putting build ID data outside of the very first page of ELF file. This > > > > > > > will test all the relaxations we did in build ID parsing logic in kernel > > > > > > > thanks to freader abstraction. > > > > > > > > > > > > > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > > > > > > > > > > > > one of my bpf selftests runs showed: > > > > > > > > > > > > test_build_id:PASS:parse_build_id 0 nsec > > > > > > subtest_nofault:PASS:skel_open 0 nsec > > > > > > subtest_nofault:PASS:link 0 nsec > > > > > > subtest_nofault:PASS:trigger_uprobe 0 nsec > > > > > > subtest_nofault:PASS:res 0 nsec > > > > > > subtest_nofault:FAIL:build_id_status unexpected build_id_status: actual 1 != expected 2 > > > > > > #42/1 build_id/nofault-paged-out:FAIL > > > > > > #42/2 build_id/nofault-paged-in:OK > > > > > > #42/3 build_id/sleepable:OK > > > > > > #42 build_id:FAIL > > > > > > > > > > > > I could never reproduce again.. but I wonder the the page could sneak > > > > > > in before the bpf program is hit and the buildid will get parsed? > > > > > > > > > > > > > > > > Yes, and I just realized that I forgot to mark this test as serial. If > > > > > there is parallel test that also runs uprobe_multi and that causes > > > > > build_id page to be paged in into page cache, then this might succeed. > > > > > So I need to mark the test itself serial. > > > > > > > > > > Another issue which I was debugging (and fixed) yesterday was that if > > > > > the memory passed for MADV_PAGEOUT is not yet memory mapped into the > > > > > current process, then it won't be really removed from the page cache. > > > > > I avoid that by first paging it in, and then MADV_PAGEOUT. > > > > > > > > ok, I triggered that in serial run, so I probably hit this one > > > > > > > > > > you did it with v2 of the patch set? I had this bug in v1, but v2 > > > should be fine, as far as I understand (due to unconditional > > > madvise(addr, page_sz, MADV_POPULATE_READ); before madvise(addr, > > > page_sz, MADV_PAGEOUT)). At least I haven't been able to reproduce > > > that anymore and BPF CI is now happy as well. > > > > yes, it's with v2 and I can still see that.. but only for the first run of > > the test after reboot.. so far I have no clue.. I can see the successful > > page-out madvise (still not sure how much is that telling about the page > > being paged out), and then the build id code sees the page just fine > > > > attaching my .config in case > > > > I wasn't able to repro this, sorry. It works very reliably for me with > your or my config. Given it also seems to work reliably in BPF CI, I'm > still inclined to add this tests, I think it's good to have that > coverage. > > I'll monitor, and if it becomes flaky, we'll need to reassess this, of course. np, I'll try to spend some more time on it jirka