On Sat, Mar 25, 2023 at 7:27 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > So for that very reason, we still have to handle the bad case where the > source PMD was not deleted right? Well, so our rules are that if nothing is mapped in a particular page table directory (any level), then it must be empty. And that "must" is actually a hard requirement, because our exit path won't even spend time tearing down page tables that don't have any mappings in them. So if you were to have non-empty pmd entries that don't have a vma associated with them, there would be a memory leak, and we really would want to warn about that case. End result: it should be sufficient to do something like "if you don't have a mapping below you within this PMD, you can expand the movement down to a full PMD". And same with the above case. Of course, the more I think about this, the more I wonder "is this even worth it". Because we have (a) mremap() that can't trigger the problematic case currently (because not overlapping), and *probably* almost never would trigger the optimization of widening the move in practice. (b) setup_arg_pages() will probably almost never triggers the problematic case in practice, since you'd have to shift the pages by *just* the right amount so in the end, maybe the "real fix" is to just say "none of this matters, let's just remove the warning". An alternative "real fix" might even be to just say "just don't shift the stack by exactly a PMD". It's unlikely to happen anyway, it's not worth optimizing for, so just make sure it doesn't happen. IOW, another alternative could be something like this: --- a/fs/exec.c +++ b/fs/exec.c @@ -783,7 +783,14 @@ int setup_arg_pages(struct linux_binprm *bprm, unlikely(vma->vm_end - vma->vm_start >= stack_top - mmap_min_addr)) return -ENOMEM; + /* + * Shift the stack up, but avoid shifting by + * exactly a PMD size, which causes issues + * when mixing page-sized and pmd-sized moves. + */ stack_shift = vma->vm_end - stack_top; + if (stack_shift && !(stack_shift & ~PMD_MASK)) + stack_shift -= PAGE_SIZE; bprm->p -= stack_shift; mm->arg_start = bprm->p; which is *really* quite ugly, and only handles the stack-grows-down case, and I'm not proud of it, and the above is also entirely untested. I will delete that patch from my system after sending out this email, and disavow any knowledge of that horrendously ugly hack. But if somebody else takes ownership of it and I won't be blamed for it, I would probably accept it as a solution. Shudder. That's nasty. Linus