Hi Linus, On Wed, Jul 15, 2020 at 11:36:59AM -0700, Linus Torvalds wrote: > On Wed, Jul 15, 2020 at 6:50 AM Kirill A. Shutemov > <kirill.shutemov@xxxxxxxxxxxxxxx> wrote: > > > > mremap(2) does not allow source and destination regions to overlap, but > > shift_arg_pages() calls move_page_tables() directly and in this case the > > source and destination overlap often. It > > Actually, before we do this patch (which I think is a good idea), I'd > like Naresh to test the attached patch. > > And Kirill, Joel, mind looking it over too. > > IT IS ENTIRELY UNTESTED. Seems a clever idea and was something I wanted as well. Thanks. Some comments below: > But I hope the concept (and the code) is fairly obvious: it makes > move_page_tables() try to align the range to be moved, if possible. > > That *should* actually remove the warning that Naresh sees, for the > simple reason that now the PMD moving case will *always* trigger when > the stack movement is PMD-aligned, and you never see the "first do a > few small pages, then do the PMD optimization" case. > > And that should mean that we don't ever hit the "oh, we already have a > target pmd" thing, because the "move the whole pmd" case will clear > the ones it has already taken care of, and you never have that "oh, > there's an empty pmd where the pages were moved away one by one". > > And that means that for this case, it's _ok_ to overlap (as long as we > copy downwards). > > What do you think? I might not have fully understand the code but I get the basic idea it is aiming for. Basically you want to align the cases where the address space is free from both sides so that you can trigger the PMD-moving optimization. Regarding the ADDR_AFTER_NEXT checks, shouldn't you check for: if (ADDR_AFTER_NEXT(ALIGN(*old_addr + *len, PMD_SIZE), old)) return; and for the len calculation, I did not follow what you did, but I think you meant something like this? Does the following reduce to what you did? At least this is a bit more readable I think: *len += (ALIGN(*new_addr + *len, PMD_SIZE) - (*new_addr + *len)); which reduces to: *len = ALIGN(*new_addr + *len, PMD_SIZE) - *new_addr; Also you did "len +=", it should be "*len +=" in this function. In try_to_align_start(), everything looks correct to me. > Ok, so I could easily have screwed up the patch, even if it's > conceptually fairly simple. The details are small, but they needed > some care. The thing _looks_ fine to me, both on a source level and > when looking at the generated code, and I made sure to try to use a > lot of small helper functions and couple of helper macros to make each > individual piece look clear and obvious. > > But obviously a single "Oh, that test is the wrong way around", or a > missing mask inversion, or whatever, could completely break the code. > So no guarantees, but it looks fairly straightforward to me. > > Naresh - this patch does *NOT* make sense to test together with > Kirill's (or Joel's) patch that says "don't do the PMD optimization at > all for overlapping cases". Those patches will hide the issue, and > make the new code only work for mremap(), not for the initial stack > setup that caused the original warning. > > Joel - I think this patch makes sense _regardless_ of the special > execve() usage case, in that it extends your "let's just copy things > at a whole PMD level" logic to even the case where the beginning > wasn't PMD-aligned (or the length wasn't sufficient). > > Obviously it only triggers when the source and destination are > mutually aligned, and if there are no other vma's around those > first/last PMD entries. Which might mean that it almost never triggers > in practice, but looking at the generated code, it sure looks like > it's cheap enough to test. Oh ok, we had some requirements in my old job about moving large address spaces which were aligned so it triggered a lot in those. Also folks in the ChromeOS team tell me it is helping them. > Didn't you have some test-cases for your pmd optimized movement cases, > at least for timing? Yes, I had a simple mremap() test which was moving a 1GB map and measuring performance. Once I get a chance I can try it out with this optimization and trigger it with unaligned addresses as well. thanks, - Joel