On Mon, Nov 28, 2022 at 10:20 PM Hao Luo <haoluo@xxxxxxxxxx> wrote: > > On Mon, Nov 28, 2022 at 5:15 PM Andrii Nakryiko > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > On Mon, Nov 28, 2022 at 5:29 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote: > > > > > > Adding bpf_vma_build_id_parse function to retrieve build id from > > > passed vma object and making it available as bpf kfunc. > > > > As a completely different way of solving this problem of retrieving > > build_id for tracing needs, can we teach kernel itself to parse and > > store build_id (probably gated behind Kconfig option) in struct file > > (ideally)? On exec() and when mmap()'ing with executable permissions, > > Linux kernel will try to fetch build_id from ELF file and if > > successful store it in struct file. Given build_id can be up to 20 > > bytes (currently) and not each struct file points to executable, we > > might want to only add a pointer field to `struct file` itself, which, > > if build_id is present, will point to > > > > struct build_id { > > char sz; > > char data[]; > > }; > > > > This way we don't increase `struct file` by much. > > > > And then any BPF program would be able to easily probe_read_kernel > > such build_id from vma_area_struct->vm_file->build_id? > > > > I'm sure I'm oversimplifying, but this seems like a good solution for > > all kinds of profiling BPF programs without the need to maintain all > > these allowlists and adding new helpers/kfuncs? > > > > I know Hao was looking at the problem of getting build_id, I'm curious > > if something like this would work for their use cases as well? > > > > This helps a little. We would like to get build_id reliably. There are > two problems we encountered. > > First, sometimes we need to get build_id from an atomic context. We > fail to get build_id if the page that contains the build_id has been > evicted from pagecache. Storing the build_id in `struct file` or > `struct inode` is a good and natural solution. But, this problem can > also be solved by using mlock to pin the page in memory. We are using > mlock, it seems to be working well right now. This is hardly a generic solution, as it requires instrumenting every application to do this, right? So what I'm proposing is exactly to avoid having each individual application do something special just to allow profiling tools to capture build_id. > > The other problem we encountered may be very specific to our own use > case. Sometimes we execute code that is mapped in an anonymous page > (not backed by file). In that case, we can't get build_id either. What > we are doing right now is writing the build_id into the > vm_area_struct->anon_name field and teach build_id_parse to try > parsing from there, when seeing an anonymous page. I can send this > with upstream if there are other users who have the same problem. > Is this due to remapping some binary onto huge pages? But regardless, your custom BPF applications can fetch this build_id from vm_area_struct->anon_name in pure BPF code, can't it? Why do you need to modify in-kernel build_id_parse implementation? > > > > > > > > We can't use build_id_parse directly as kfunc, because we would > > > not have control over the build id buffer size provided by user. > > > > > > Instead we are adding new bpf_vma_build_id_parse function with > > > 'build_id__sz' argument that instructs verifier to check for the > > > available space in build_id buffer. > > > > > > This way we check that there's always available memory space > > > behind build_id pointer. We also check that the build_id__sz is > > > at least BUILD_ID_SIZE_MAX so we can place any buildid in. > > > > > > The bpf_vma_build_id_parse kfunc is marked as KF_TRUSTED_ARGS, > > > so it can be only called with trusted vma objects. These are > > > currently provided only by find_vma callback function and > > > task_vma iterator program. > > > > > > Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx> > > > --- > > > include/linux/bpf.h | 4 ++++ > > > kernel/trace/bpf_trace.c | 31 +++++++++++++++++++++++++++++++ > > > 2 files changed, 35 insertions(+) > > > > > > > [...]