On 4/19/23 11:09 AM, Matthew Wilcox wrote: > On Wed, Apr 19, 2023 at 11:07:04AM -0400, Waiman Long wrote: >> On 4/18/23 23:46, Matthew Wilcox wrote: >>> On Tue, Apr 18, 2023 at 09:16:37PM -0400, Waiman Long wrote: >>>> 1) App runs creating lots of threads. >>>> 2) It mmap's 256K pages of anonymous memory. >>>> 3) It writes executable code to that memory. >>>> 4) It calls mprotect() with PROT_EXEC on that memory so >>>> it can subsequently execute the code. >>>> >>>> The above mprotect() will fail if the mmap'd region's VMA gets merged with >>>> the VMA for one of the thread stacks. That's because the default RHEL >>>> SELinux policy is to not allow executable stacks. >>> By the way, this is a daft policy. The policy you really want is >>> EXEC|WRITE is not allowed. A non-writable stack is useless, so it's >>> actually a superset of your current policy. Forbidding _simultaneous_ >>> write and executable is just good programming. This way, you don't need >>> to care about the underlying VMA's current permissions, you just need >>> to do: >>> >>> if ((prot & (PROT_EXEC|PROT_WRITE)) == (PROT_EXEC|PROT_WRITE)) >>> return -EACCESS; >> >> I am not totally sure if the application changes the VMA to read-only first. >> Even if it does that, it highlights another possible issue when an anonymous >> VMA is merged with a stack VMA. Either the mprotect() to write-protect the >> VMA will fail or the application will segfault if it writes stuff to the >> stack. This particular issue is not related to SELinux. It provides another >> good idea why we should avoid merging stack VMA to anonymous VMA. > > mprotect will split the VMA into two VMAs, one that is > PROT_READ|PROT_WRITE and one the is PROT_READ|PROT_EXEC. > But in this case, the latter still has PROT_WRITE. This was reported by a large data analytics customer. They started getting infrequent random crashes in code they haven't touched in 10 years. One of the threads in their program mmaps a large region using PROT_READ|PROT_WRITE, and that region just happens to be merged with the thread's stack. Then they copy a small snipit of code to a location somewhere within that mapped region. For the one page that contains that code, they mprotect it to PROT_READ|PROT_WRITE|PROT_EXEC. I recall they're still reading and writing data elsewhere on that page. Joe