On Fri, Mar 24, 2023 at 6:43 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > Wouldn't it be better to instead fix it from the caller side? Like > making it non-overlapping. I wonder if we could just do something like this in mremap() instead - if old/new are mutually PMD_ALIGNED - *and* there is no vma below new within the same PMD - then just expand the mremap to be PMD-aligned downwards IOW, the problem with the exec stack moving case isn't really that it's overlapping: that part is fine. We're moving downwards, and we start from the bottom, so the moving part works fine. No, the problem is that we *start* by moving individual pages, and then by the time we've a few pages down by a whole PMD, we finish the source PMD (and we've cleared all the contents of it), but it still exists. And at *that* point, when we go and start copying the next page, we're suddenly fully PMD-aligned, and now we try to copy a whole PMD, and then that code is unhappy about the fact that the old (empty) PMD is there in the target. And for all of this to happen, we need to move things by an exact multiple of PMD size, because otherwise we'd never get to that aligned situation at all, and we'd always do all the movement in individual pages, and everything would be just fine. And more importantly, if we had just *started* with moving a whole PMD, this also wouldn't have happened. But we didn't. We started moving individual pages. So you could see the warning not as a "this range overlaps" warning (it's fine, and happens all the time, and we do individual pages that way quite happily), but really as a "hey, this was very inefficient - you shouldn't have done those individual pages as several small independent invidual pages in the first place" warning. So some kind of /* Is the movement mutually PMD-aligned? */ if ((old_addr ^ new_addr) & ~PMD_MASK == 0) { .. try to extend the move_vma() down to the *aligned* PMD case .. } logic in move_page_tables() would get rid of the warning, and would make the move more efficient since you'd skip the "move individual pages and allocate a new PMD" case entirely. This is all fairly com,plicated, and the "try to extend the move range" would also have to depend on CONFIG_HAVE_MOVE_PMD etc, so I'm not saying it's trivial. But it would seem to be a really nice optimization, in addition to getting rid of the warning. It could even help real world cases outside of this odd stack remapping case if users ever end up moving vma's by multiples of PMD_SIZE, and there aren't other vma's around the source/target that disable the optimization. Hmm? Anybody want to look into that? It looks hairy enough that I think that "you could test this with mutually aligned mremap() source/targets in some test program" would be a good thing. Because the pure execve() case is rare enough that using *that* as a test-case seems like a fool's errand. (To make things very clear: the important part is that the source and targets aren't *actually* PMD-aligned, just mutually aligned so that you *can* do the mremap() by just moving whole PMD's around) Linus