On 2/22/19 2:01 PM, Kirill A. Shutemov wrote: > On Thu, Feb 21, 2019 at 09:54:06AM +0100, Oscar Salvador wrote: >> When using mremap() syscall in addition to MREMAP_FIXED flag, >> mremap() calls mremap_to() which does the following: >> >> 1) unmaps the destination region where we are going to move the map >> 2) If the new region is going to be smaller, we unmap the last part >> of the old region >> >> Then, we will eventually call move_vma() to do the actual move. >> >> move_vma() checks whether we are at least 4 maps below max_map_count >> before going further, otherwise it bails out with -ENOMEM. >> The problem is that we might have already unmapped the vma's in steps >> 1) and 2), so it is not possible for userspace to figure out the state >> of the vma's after it gets -ENOMEM, and it gets tricky for userspace >> to clean up properly on error path. >> >> While it is true that we can return -ENOMEM for more reasons >> (e.g: see may_expand_vm() or move_page_tables()), I think that we can >> avoid this scenario in concret if we check early in mremap_to() if the >> operation has high chances to succeed map-wise. >> >> Should not be that the case, we can bail out before we even try to unmap >> anything, so we make sure the vma's are left untouched in case we are likely >> to be short of maps. >> >> The thumb-rule now is to rely on the worst-scenario case we can have. >> That is when both vma's (old region and new region) are going to be split >> in 3, so we get two more maps to the ones we already hold (one per each). >> If current map count + 2 maps still leads us to 4 maps below the threshold, >> we are going to pass the check in move_vma(). >> >> Of course, this is not free, as it might generate false positives when it is >> true that we are tight map-wise, but the unmap operation can release several >> vma's leading us to a good state. >> >> Because of that I am sending this as a RFC. >> Another approach was also investigated [1], but it may be too much hassle >> for what it brings. > > I believe we don't need the check in move_vma() with this patch. Or do we? move_vma() can be also called directly from SYSCALL_DEFINE5(mremap) for the non-MMAP_FIXED case. So unless there's further refactoring, the check is still needed. >> >> [1] https://lore.kernel.org/lkml/20190219155320.tkfkwvqk53tfdojt@xxxxxxxxxxxx/ >> >> Signed-off-by: Oscar Salvador <osalvador@xxxxxxx> Acked-by: Vlastimil Babka <vbabka@xxxxxxx>