On Tue, Jan 25, 2022 at 3:54 PM Hao Luo <haoluo@xxxxxxxxxx> wrote: > > Thanks Song for your suggestion. > > On Mon, Jan 24, 2022 at 11:08 PM Song Liu <song@xxxxxxxxxx> wrote: > > > > On Mon, Jan 24, 2022 at 2:43 PM Hao Luo <haoluo@xxxxxxxxxx> wrote: > > > > > > Dear BPF experts, > > > > > > I'm working on collecting some kernel performance data using BPF > > > tracing prog. Our performance profiling team wants to associate the > > > data with user stack information. One of the requirements is to > > > reliably get BuildIDs from bpf_get_stackid() and other similar helpers > > > [1]. > > > > > > As part of an early investigation, we found that there are a couple > > > issues that make bpf_get_stackid() much less reliable than we'd like > > > for our use: > > > > > > 1. The first page of many binaries (which contains the ELF headers and > > > thus the BuildID that we need) is often not in memory. The failure of > > > find_get_page() (called from build_id_parse()) is higher than we would > > > want. > > > > Our top use case of bpf_get_stack() is called from NMI, so there isn't > > much we can do. Maybe it is possible to improve it by changing the > > layout of the binary and the libraries? Specifically, if the text is > > also in the first page, it is likely to stay in memory? > > > > We are seeing 30-40% of stack frames not able to get build ids due to > this. This is a place where we could improve the reliability of build > id. > > There were a few proposals coming up when we found this issue. One of > them is to have userspace mlock the first page. This would be the > easiest fix, if it works. Another proposal from Ian Rogers (cc'ed) is > to embed build id in vma. This is an idea similar to [1], but it's > unclear (at least to me) where to store the string. I'm wondering if > we can introduce a sleepable version of bpf_get_stack() if it helps. > When a page is not present, sleepable bpf_get_stack() can bring in the > page. I guess it is possible to have different flavors of bpf_get_stack(). However, I am not sure whether the actual use case could use sleepable BPF programs. Our user of bpf_get_stack() is a profiler. The BPF program which triggers a perf_event from NMI, where we really cannot sleep. If we have target use case that could sleep, sleepable bpf_get_stack() sounds reasonable to me. > > [1] https://lwn.net/Articles/867818/ > > > > 2. When anonymous huge pages are used to hold some regions of process > > > text, build_id_parse() also fails to get a BuildID because > > > vma->vm_file is NULL. > > > > How did the text get in anonymous memory? I guess it is NOT from JIT? > > We had a hack to use transparent huge page for application text. The > > hack looks like: > > > > "At run time, the application creates an 8MB temporary buffer and the > > hot section of the executable memory is copied to it. The 8MB region in > > the executable memory is then converted to a huge page (by way of an > > mmap() to anonymous pages and an madvise() to create a huge page), the > > data is copied back to it, and it is made executable again using > > mprotect()." > > > > If your case is the same (or similar), it can probably be fixed with > > CONFIG_READ_ONLY_THP_FOR_FS, and modified user space. > > > > In our use cases, we have text mapped to huge pages that are not > backed by files. vma->vm_file could be null or points some fake file. > This causes challenges for us on getting build id for these code text. So, what is the ideal output in these cases? If there isn't a back file, we don't really have good build-id for it, right? Thanks, Song