On Mon, Feb 25, 2019 at 12:46:46PM +0100, Vlastimil Babka wrote: > On 2/22/19 2:01 PM, Kirill A. Shutemov wrote: > > On Thu, Feb 21, 2019 at 09:54:06AM +0100, Oscar Salvador wrote: > >> When using mremap() syscall in addition to MREMAP_FIXED flag, > >> mremap() calls mremap_to() which does the following: > >> > >> 1) unmaps the destination region where we are going to move the map > >> 2) If the new region is going to be smaller, we unmap the last part > >> of the old region > >> > >> Then, we will eventually call move_vma() to do the actual move. > >> > >> move_vma() checks whether we are at least 4 maps below max_map_count > >> before going further, otherwise it bails out with -ENOMEM. > >> The problem is that we might have already unmapped the vma's in steps > >> 1) and 2), so it is not possible for userspace to figure out the state > >> of the vma's after it gets -ENOMEM, and it gets tricky for userspace > >> to clean up properly on error path. > >> > >> While it is true that we can return -ENOMEM for more reasons > >> (e.g: see may_expand_vm() or move_page_tables()), I think that we can > >> avoid this scenario in concret if we check early in mremap_to() if the > >> operation has high chances to succeed map-wise. > >> > >> Should not be that the case, we can bail out before we even try to unmap > >> anything, so we make sure the vma's are left untouched in case we are likely > >> to be short of maps. > >> > >> The thumb-rule now is to rely on the worst-scenario case we can have. > >> That is when both vma's (old region and new region) are going to be split > >> in 3, so we get two more maps to the ones we already hold (one per each). > >> If current map count + 2 maps still leads us to 4 maps below the threshold, > >> we are going to pass the check in move_vma(). > >> > >> Of course, this is not free, as it might generate false positives when it is > >> true that we are tight map-wise, but the unmap operation can release several > >> vma's leading us to a good state. > >> > >> Because of that I am sending this as a RFC. > >> Another approach was also investigated [1], but it may be too much hassle > >> for what it brings. > > > > I believe we don't need the check in move_vma() with this patch. Or do we? > > move_vma() can be also called directly from SYSCALL_DEFINE5(mremap) for > the non-MMAP_FIXED case. So unless there's further refactoring, the > check is still needed. Okay, makes sense. > >> > >> [1] https://lore.kernel.org/lkml/20190219155320.tkfkwvqk53tfdojt@xxxxxxxxxxxx/ > >> > >> Signed-off-by: Oscar Salvador <osalvador@xxxxxxx> > > Acked-by: Vlastimil Babka <vbabka@xxxxxxx> Acked-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> -- Kirill A. Shutemov