Thanks Jann and Alexei, On 02-Apr 07:15, Jann Horn wrote: > On Thu, Apr 2, 2020 at 6:30 AM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > On Thu, Apr 02, 2020 at 06:03:57AM +0200, KP Singh wrote: > > > On 01-Apr 17:09, Alexei Starovoitov wrote: > > > > > + unsigned long reqprot, unsigned long prot, int ret) > > > > > +{ [...] > > > > and see exactly the same values as bpf side (at least it was nice to see > > that all CO-RE logic is working as expected :)) > > > > [ 24.787442] start 523000 39b9000 > > > > I think it has something to do with the way test_progs is linked. > > But the problem is in condition itself. > > I suspect you copy-pasted it from selinux_file_mprotect() ? > > I think it's incorrect there as well. > > Did we just discover a way to side step selinux protection? > > Try objdump -h test_progs|grep bss > > the number I see in vma->vm_start is the beginning of .bss rounded to page boundary. > > I wonder where your 55dc6e8df000 is coming from. > > I suspect that you're using different versions of libc, or something's > different in the memory layout, or something like that. The brk region > is used for memory allocations using brk(), but memory allocations > using mmap() land outside it. At least some versions of libc try to > allocate memory for malloc() with brk(), then fall back to mmap() if > that fails because there's something else behind the current end of > the brk region; but I think there might also be versions of libc that > directly use mmap() and don't even try to use brk(). Yeah missed this that heap can also be allocated using mmap: Quoting the manual: """ Normally, malloc() allocates memory from the heap, and adjusts the size of the heap as required, using sbrk(2). When allocating blocks of memory larger than MMAP_THRESHOLD bytes, the glibc malloc() implementation allocates the memory as a private anonymous mapping using mmap(2). MMAP_THRESHOLD is 128 kB by default, but is adjustable using mallopt(3). Prior to Linux 4.7 allocations performed using mmap(2) were unaffected by the RLIMIT_DATA resource limit; since Linux 4.7, this limit is also enforced for allocations performed using mmap(2). """ So it seems like we might have separate MMAP_THRESHOLD or resource limits. I updated my test case to check for mmaps on the stack instead: diff --git a/tools/testing/selftests/bpf/prog_tests/test_lsm.c b/tools/testing/selftests/bpf/prog_tests/test_lsm.c index 1e4c258de09d..64c13c850611 100644 --- a/tools/testing/selftests/bpf/prog_tests/test_lsm.c +++ b/tools/testing/selftests/bpf/prog_tests/test_lsm.c @@ -15,8 +15,13 @@ char *CMD_ARGS[] = {"true", NULL}; -int heap_mprotect(void) +#define PAGE_SIZE 4096 +#define GET_PAGE_ADDR(ADDR) \ + (char *)(((unsigned long) ADDR) & ~(PAGE_SIZE-1)) + +int stack_mprotect(void) { + void *buf; long sz; int ret; @@ -25,12 +30,9 @@ int heap_mprotect(void) if (sz < 0) return sz; - buf = memalign(sz, 2 * sz); - if (buf == NULL) - return -ENOMEM; - - ret = mprotect(buf, sz, PROT_READ | PROT_WRITE | PROT_EXEC); - free(buf); + buf = alloca(PAGE_SIZE * 3); + ret = mprotect(GET_PAGE_ADDR(buf + PAGE_SIZE), PAGE_SIZE, + PROT_READ | PROT_WRITE | PROT_EXEC); return ret; } @@ -73,8 +75,8 @@ void test_test_lsm(void) skel->bss->monitored_pid = getpid(); - err = heap_mprotect(); - if (CHECK(errno != EPERM, "heap_mprotect", "want errno=EPERM, got %d\n", + err = stack_mprotect(); + if (CHECK(errno != EPERM, "stack_mprotect", "want err=EPERM, got %d\n", errno)) goto close_prog; diff --git a/tools/testing/selftests/bpf/progs/lsm.c b/tools/testing/selftests/bpf/progs/lsm.c index a4e3c223028d..b4598d4bc4f7 100644 --- a/tools/testing/selftests/bpf/progs/lsm.c +++ b/tools/testing/selftests/bpf/progs/lsm.c @@ -23,12 +23,12 @@ int BPF_PROG(test_int_hook, struct vm_area_struct *vma, return ret; __u32 pid = bpf_get_current_pid_tgid() >> 32; - int is_heap = 0; + int is_stack = 0; - is_heap = (vma->vm_start >= vma->vm_mm->start_brk && - vma->vm_end <= vma->vm_mm->brk); + is_stack = (vma->vm_start <= vma->vm_mm->start_stack && + vma->vm_end >= vma->vm_mm->start_stack); - if (is_heap && monitored_pid == pid) { + if (is_stack && monitored_pid == pid) { mprotect_count++; ret = -EPERM; } and the the logic seems to work for me. Do you think we could use this instead? - KP > > So yeah, security checks based on the brk region aren't exactly > useful; but e.g. in SELinux, both cases have appropriate checks. The > brk region gets SECCLASS_PROCESS:PROCESS__EXECHEAP, anonymous mmap > allocations get SECCLASS_PROCESS:PROCESS__EXECMEM in > file_map_prot_check() instead. (This makes *some* amount of sense - > although not a lot - because for the brk region you know that it comes > from something like malloc(), while an anonymous mmap() allocation > might reasonably be used for JIT executable memory.) > > In other words, you may want to pick something different as test case, > since the behavior here depends on libc.