On Wed, May 20, 2009 at 04:12:32PM +1000, Greg Ungerer wrote: > I have a system lockup problem that I have been looking at on a custom > Cavium/Octeon 5010 based design. I am running on linux-2.6.29 with > David Daney's latest round of PCI and ethernet patches (posted here > on this list). > > I have tracked the problem back to local_flush_tlb_kernel_range() in > arch/mips/mm/tlb-r4k.c. At the top of this function is: > > void local_flush_tlb_kernel_range(unsigned long start, unsigned long > end) > { > unsigned long flags; > int size; > > ENTER_CRITICAL(flags); > size = (end - start + (PAGE_SIZE - 1)) >> PAGE_SHIFT; > size = (size + 1) >> 1; > if (size <= current_cpu_data.tlbsize / 2) { > > The problem is that typical example values I see passed in for start > and end are: > > start = c000000000006000 > end = ffffffffc01d8000 > > Now the vmalloc area starts at 0xc000000000000000 and the kernel code > and data is all at 0xffffffff80000000 and above. I don't know if the > start and end are reasonable values, but I can see some logic as to > where they come from. The code path that leads to this is via > __vunmap() and __purge_vmap_area_lazy(). So it is not too difficult > to see how we end up with values like this. Either start or end address is sensible but not the combination - both addresses should be in the same segment. Start is in XKSEG, end in CKSEG2 and in between there are vast wastelands of unused address space exabytes in size. > But the size calculation above with these types of values will result > in still a large number. Larger than the 32bit "int" that is "size". > I see large negative values fall out as size, and so the following > tlbsize check becomes true, and the code spins inside the loop inside > that if statement for a _very_ long time trying to flush tlb entries. > > This is of course easily fixed, by making that size "unsigned long". > The patch below trivially does this. > > But is this analysis correct? Yes - but I think we have two issues here. The one is the calculation overflowing int for the arguments you're seeing. The other being that the arguments simply are looking wrong. There are a few more instances of the same overflow issue which the patch below is fixing. Ralf arch/mips/mm/tlb-r3k.c | 6 ++---- arch/mips/mm/tlb-r4k.c | 6 ++---- arch/mips/mm/tlb-r8k.c | 3 +-- 3 files changed, 5 insertions(+), 10 deletions(-) diff --git a/arch/mips/mm/tlb-r3k.c b/arch/mips/mm/tlb-r3k.c index f0cf46a..1c0048a 100644 --- a/arch/mips/mm/tlb-r3k.c +++ b/arch/mips/mm/tlb-r3k.c @@ -82,8 +82,7 @@ void local_flush_tlb_range(struct vm_area_struct *vma, unsigned long start, int cpu = smp_processor_id(); if (cpu_context(cpu, mm) != 0) { - unsigned long flags; - int size; + unsigned long size, flags; #ifdef DEBUG_TLB printk("[tlbrange<%lu,0x%08lx,0x%08lx>]", @@ -121,8 +120,7 @@ void local_flush_tlb_range(struct vm_area_struct *vma, unsigned long start, void local_flush_tlb_kernel_range(unsigned long start, unsigned long end) { - unsigned long flags; - int size; + unsigned long size, flags; #ifdef DEBUG_TLB printk("[tlbrange<%lu,0x%08lx,0x%08lx>]", start, end); diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c index 9619f66..892be42 100644 --- a/arch/mips/mm/tlb-r4k.c +++ b/arch/mips/mm/tlb-r4k.c @@ -117,8 +117,7 @@ void local_flush_tlb_range(struct vm_area_struct *vma, unsigned long start, int cpu = smp_processor_id(); if (cpu_context(cpu, mm) != 0) { - unsigned long flags; - int size; + unsigned long size, flags; ENTER_CRITICAL(flags); size = (end - start + (PAGE_SIZE - 1)) >> PAGE_SHIFT; @@ -160,8 +159,7 @@ void local_flush_tlb_range(struct vm_area_struct *vma, unsigned long start, void local_flush_tlb_kernel_range(unsigned long start, unsigned long end) { - unsigned long flags; - int size; + unsigned long size, flags; ENTER_CRITICAL(flags); size = (end - start + (PAGE_SIZE - 1)) >> PAGE_SHIFT; diff --git a/arch/mips/mm/tlb-r8k.c b/arch/mips/mm/tlb-r8k.c index 4f01a3b..4ec95cc 100644 --- a/arch/mips/mm/tlb-r8k.c +++ b/arch/mips/mm/tlb-r8k.c @@ -111,8 +111,7 @@ out_restore: /* Usable for KV1 addresses only! */ void local_flush_tlb_kernel_range(unsigned long start, unsigned long end) { - unsigned long flags; - int size; + unsigned long size, flags; size = (end - start + (PAGE_SIZE - 1)) >> PAGE_SHIFT; size = (size + 1) >> 1;