Re: Question: CO-RE-enabled PT_REGS macros give strange results

Alan Maguire <alan.maguire@xxxxxxxxxx> · Tue, 25 Jul 2023 15:04:48 +0100

On 25/07/2023 00:00, Alan Maguire wrote:
> On 24/07/2023 16:04, Timofei Pushkin wrote:
>> On Mon, Jul 24, 2023 at 3:36 PM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
>>>
>>> On 24/07/2023 11:32, Timofei Pushkin wrote:
>>>> Dear BPF community,
>>>>
>>>> I'm developing a perf_event BPF program which reads some register
>>>> values (frame and instruction pointers in particular) from the context
>>>> provided to it. I found that CO-RE-enabled PT_REGS macros give results
>>>> different from the results of the usual PT_REGS  macros. I run the
>>>> program on the same system I compiled it on, and so I cannot
>>>> understand why the results differ and which ones should I use?
>>>>
>>>> From my tests, the results of the usual macros are the correct ones
>>>> (e.g. I can symbolize the instruction pointers I get this way), but
>>>> since I try to follow the CO-RE principle, it seems like I should be
>>>> using the CO-RE-enabled variants instead.
>>>>
>>>> I did some experiments and found out that it is the
>>>> bpf_probe_read_kernel part of the CO-RE-enabled PT_REGS macros that
>>>> change the results and not __builtin_preserve_access_index. But I
>>>> still don't get why exactly it changes the results.
>>>>
>>>
>>> Can you provide the exact usage of the BPF CO-RE macros that isn't
>>> working, and the equivalent non-CO-RE version that is? Also if you
>>
>> As a minimal example, I wrote the following little BPF program which
>> prints instruction pointers obtained with non-CO-RE and CO-RE macros:
>>
>> volatile const pid_t target_pid;
>>
>> SEC("perf_event")
>> int do_test(struct bpf_perf_event_data *ctx) {
>>     pid_t pid = bpf_get_current_pid_tgid();
>>     if (pid != target_pid) return 0;
>>
>>     unsigned long p = PT_REGS_IP(&ctx->regs);
>>     unsigned long p_core = PT_REGS_IP_CORE(&ctx->regs);
>>     bpf_printk("non-CO-RE: %lx, CO-RE: %lx", p, p_core);
>>
>>     return 0;
>> }
>>
>> From user space, I set the target PID and attach the program to CPU
>> clock perf events (error checking and cleanup omitted for brevity):
>>
>> int main(int argc, char *argv[]) {
>>     // Load the program also setting the target PID
>>     struct test_program_bpf *skel = test_program_bpf__open();
>>     skel->rodata->target_pid = (pid_t) strtol(argv[1], NULL, 10);
>>     test_program_bpf__load(skel);
>>
>>     // Attach to perf events
>>     struct perf_event_attr attr = {
>>         .type = PERF_TYPE_SOFTWARE,
>>         .size = sizeof(struct perf_event_attr),
>>         .config = PERF_COUNT_SW_CPU_CLOCK,
>>         .sample_freq = 1,
>>         .freq = true
>>     };
>>     for (int cpu_i = 0; cpu_i < libbpf_num_possible_cpus(); cpu_i++) {
>>         int perf_fd = syscall(SYS_perf_event_open, &attr, -1, cpu_i, -1, 0);
>>         bpf_program__attach_perf_event(skel->progs.do_test, perf_fd);
>>     }
>>
>>     // Wait for Ctrl-C
>>     pause();
>>     return 0;
>> }
>>
>> As an experiment, I launched a simple C program with an endless loop
>> in main and started the BPF program above with its target PID set to
>> the PID of this simple C program. Then by checking the virtual memory
>> mapped for the C program (with "cat /proc/<PID>/maps"), I found out
>> that its .text section got mapped into 55ca2577b000-55ca2577c000
>> address space. When I checked the output of the BPF program, I got
>> "non-CO-RE: 55ca2577b131, CO-RE: ffffa58810527e48". As you can see,
>> the non-CO-RE result maps into the .text section of the launched C
>> program (as it should since this is the value of the instruction
>> pointer), while the CO-RE result does not.
>>
>> Alternatively, if I replace PT_REGS_IP and PT_REGS_IP_CORE with the
>> equivalents for the stack pointer (PT_REGS_SP and PT_REGS_SP_CORE), I
>> get results that correspond to the stack address space from the
>> non-CO-RE macro, but I always get 0 from the CO-RE macro.
>>
>>> can provide details on the platform you're running on that will
>>> help narrow down the issue. Thanks!
>>
>> Sure. I'm running Ubuntu 22.04.1, kernel version 5.19.0-46-generic,
>> the architecture is x86_64, clang 14.0.0 is used to compile BPF
>> programs with flags -g -O2 -D__TARGET_ARCH_x86.
>>
> 
> Thanks for the additional details! I've reproduced this on
> bpf-next with LLVM 15; I'm seeing the same issues with the CO-RE
> macros, and with BPF_CORE_READ(). However with extra libbpf debugging
> I do see that we pick up the right type id/index for the ip field in
> pt_regs:
> 
> libbpf: prog 'do_test': relo #4: matching candidate #0 <byte_off> [216]
> struct pt_regs.ip (0:16 @ offset 128)
> 
> One thing I noticed - perhaps this will ring some bells for someone -
> if I use __builtin_preserve_access_index() I get the same (correct)
> value for ip as is retrieved with PT_REGS_IP():
> 
>     __builtin_preserve_access_index(({
>         p_core = ctx->regs.ip;
>     }));
> 
> I'll check with latest LLVM to see if the issue persists there.
> 

The problem occurs with latest bpf-next + latest LLVM too. Perf event
programs fix up context accesses to the "struct bpf_perf_event_data *"
context, so accessing ctx->regs in your program becomes accessing the
"struct bpf_perf_event_data_kern *" regs, which is a pointer to
struct pt_regs. So I _think_ that's why the

    __builtin_preserve_access_index(({
        p_core = ctx->regs.ip;
    }));

...works; ctx->regs is fixed up to point at the right place, then
CO-RE does its thing with the results. Contrast this with

bpf_probe_read_kernel(&ip, sizeof(ip), &ctx->regs.ip);

In the latter case, the fixups don't seem to happen and we get a
bogus address which appears to be consistently 218 bytes after the ctx
pointer. I've confirmed that a basic bpf_probe_read_kernel()
exposes the issue (and gives the same wrong address as a CO-RE-wrapped
bpf_probe_read_kernel()).

I tried some permutations like defining

	struct pt_regs *regs = &ctx->regs;

...to see if that helps, but I think in that case the accesses aren't
caught by the verifier because we use the & operator on the ctx->regs.

Not sure how smart the verifier can be about context accesses like this;
can someone who understands that code better than me take a look at this?

In the meantime the workaround described above should do the trick.

Thanks!

Alan