Hi Hugh, Hugh Dickins <hugh@xxxxxxxxxxx> writes: > On Sun, 17 Aug 2008, Johannes Weiner wrote: >> zap_pte_range() overruns the page tables if the distance between the >> start and end is not a multiple of the pagesize. Because then, >> `start' will never be equal to `end' and we will keep looping. >> >> To fix this, round the boundary addresses to exclude partial pages from >> the range completely, we must not unmap them anyway. > > You've a good idea here, but no. > >> >> Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxxx> >> --- >> >> Hugh Dickins <hugh@xxxxxxxxxxx> writes: >> >> > On Sat, 16 Aug 2008, Rafael J. Wysocki wrote: >> >> >> >> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11335 >> >> Subject : 2.6.27-rc2-git5 BUG: unable to handle kernel paging request >> >> Submitter : Randy Dunlap <randy.dunlap@xxxxxxxxxx> >> >> Date : 2008-08-12 4:18 (5 days old) >> >> References : http://marc.info/?l=linux-kernel&m=121851477201960&w=4 >> >> Handled-By : Hugh Dickins <hugh@xxxxxxxxxxx> >> > >> > This should still be listed for now, it's interesting, >> > but I doubt we'll make any progress unless it can be reproduced. >> >> I think this patch fixes it. exit_mmap() even calls unmap_vmas() with >> an ending address of -1UL which is not page-aligned in my book and on my >> architecture :) > > You need to take into consideration that gazillions of calls to > exit_mmap(), unmap_vmas() and zap_pte_range() have been succeeding > since we reworked those loops three years ago. exit_mmap() calls > unmap_vmas() with a start_addr of 0 (so your patch won't help that), > and the (unsigned long) end_addr of -1 is simply an upper bound on > on how far the vma loop goes, it doesn't need the alignment your > patch enforces. Now that you say it, yes, I don't see any way how the upper bound of -1UL could break it as vm_end is most probably lower than that :) However: start = max(vma->vm_start, start_addr); end = min(vma->vm_end, end_addr); The overrun *is* possible if the given ending address is lower than the vm_end. The same goes for a broken start if it is higher than vm_start. > That's a great idea that overrunning a pagetable may account for > Randy's apparent pagetable corruption: I (and please, you too) need > to go back over the info he's given with that hypothesis in mind, > it certainly fits well the fact that 6 out of 7 entries were found > bad at the _start_ of a pagetable before collapsing - though OTOH > I don't think it does fit with the two processes seeing similar > but different corruption, or the general protection faults. > But definitely worth pursuing, it hadn't crossed my mind. Frankly, I didn't look too much at what Randy reported. I ran off a bit quick when I saw that the fault came on an empty PMD within this code as this overrun issue was still in the back of my head and I knew there were similar loops involved. I will try and help debugging this further. > But if a pagetable is being overrun in that way, doesn't that mean > that a vma->vm_start (or vma->vm_end?) has got corrupted, and then > we'll need to work that out. vm_start and vm_end (unless corrupted) > are always page aligned, and there's lots of code which assumes that: > or have you noticed somewhere that's not so? No, I have not. But an overrun condition also does not require broken VMA bounds. Although that could be a possibility, too, of course. >> It is a similar problem to what we had with gup some weeks ago. > > You're right that those pgd_addr_end() etc. loops have an implicit > and fragile dependence on the page alignment of addr and end. They > were written that way to maximize efficiency and be homogeneous > across the levels, while handling the wrapped end 0 case. But both > fast gup and pagewalk have stumbled on those assumptions recently. Yeah, especially since they could cause silent page table corruption :( In this respect, I still think that my patch has a point. Because yes, the looping depends on page aligned boundaries, but we don't check for this required dependency and values leading to overruns are able to pass through, as explained above. > Hugh Hannes -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html