On Fri, Aug 05, 2011 at 12:50:47PM +0200, Johannes Weiner wrote: > On Thu, Jul 28, 2011 at 04:26:31PM +0200, Andrea Arcangeli wrote: > > Hello, > > > > this is the latest version of the mremap THP native implementation > > plus optimizations. > > > > So first question is: am I right, the "- 1" that I am removing below > > was buggy? It's harmless because these old_end/next are page aligned, > > but if PAGE_SIZE would be 1, it'd break, right? It's really confusing > > to read even if it happens to work. Please let me know because that "- > > 1" ironically it's the thing I'm less comfortable about in this patch. > > > @@ -134,14 +126,17 @@ unsigned long move_page_tables(struct vm > > { > > unsigned long extent, next, old_end; > > pmd_t *old_pmd, *new_pmd; > > + bool need_flush = false; > > > > old_end = old_addr + len; > > flush_cache_range(vma, old_addr, old_end); > > > > + mmu_notifier_invalidate_range_start(vma->vm_mm, old_addr, old_end); > > + > > for (; old_addr < old_end; old_addr += extent, new_addr += extent) { > > cond_resched(); > > next = (old_addr + PMD_SIZE) & PMD_MASK; > > - if (next - 1 > old_end) > > + if (next > old_end) > > next = old_end; > > If old_addr + PMD_SIZE overflows, next will be zero, thus smaller than > old_end and not fixed up. This results in a bogus extent length here: So basically this -1 is to prevent an overflow and it's relaying on PAGE_SIZE > 1 to be safe, which is safe assumption. > > extent = next - old_addr; > > which I think can overrun old_addr + len if the extent should have > actually been smaller than the distance between new_addr and the next > PMD as well as that LATENCY_LIMIT to which extent is capped a few > lines below. I haven't checked all the possibilities, though. > > It could probably be > > if (next > old_end || next - 1 > old_end) > next = old_end > > to catch the theoretical next == old_end + 1 case, but PAGE_SIZE > 1 > looks like a sensible assumption to me. However if old_end == 0, I doubt it could still work safe because next would be again zero leading to an extreme high extent. It looks like the last page must not be allowed to be mapped for this to work safe. sparc is using 0xf0000000 as TASK_SIZE. But being safe on the PMD is better in case it spans more than 32-4 bits. The pmd_shift on sparc32 seems at most 23, so it doesn't look problematic. Maybe some other arch would lead to trouble, but it's possible it never happens to be problematic and it was just an off by one error as I thought? For this to break the userland must end less than a PMD_SIZE before the end of the address space and it's possible no arch is like that. x86 has pgd_size of 4m on 32bit nopae, and 2m pmd_size for pae 32/64bit and address space 32bit ends at 3g... leaving 1g vs 4m or >=1g vs 2m. But hey I prefer to be safe against overflow even if no arch is actually going to be in trouble and if TASK_SIZE is always set at least one PMD away from the overflowing point like it seems to happen on sparc32 too. So rather than doing -1 which looks more like an off by one error, it's cleaner to do an explicit overflow check. This first calculates the next pmd start which may be 0 if it overflows. Then does 0 - old_addr and stores it in extent. Then comares old_end-old_addr with extent (where extent is from old_addr to the next pmd start). If old_end-old_addr is smaller than extent it stores that into extent to stop at old_end instead of at the next pmd start. So the -1 goes away. diff --git a/mm/mremap.c b/mm/mremap.c --- a/mm/mremap.c +++ b/mm/mremap.c @@ -136,9 +136,10 @@ unsigned long move_page_tables(struct vm for (; old_addr < old_end; old_addr += extent, new_addr += extent) { cond_resched(); next = (old_addr + PMD_SIZE) & PMD_MASK; - if (next > old_end) - next = old_end; + /* even if next overflowed, extent below will be ok */ extent = next - old_addr; + if (extent > old_end - old_addr) + extent = old_end - old_addr; old_pmd = get_old_pmd(vma->vm_mm, old_addr); if (!old_pmd) continue; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>