Re: system lockup with 2.6.29 on Cavium/Octeon

Ralf Baechle <ralf@xxxxxxxxxxxxxx> · Fri, 22 May 2009 10:23:34 +0100

On Fri, May 22, 2009 at 11:19:00AM +1000, Greg Ungerer wrote:

> Atsushi Nemoto wrote:
>> On Wed, 20 May 2009 15:26:04 +0100, Ralf Baechle <ralf@xxxxxxxxxxxxxx> wrote:
>>>> Now the vmalloc area starts at 0xc000000000000000 and the kernel code
>>>> and data is all at 0xffffffff80000000 and above. I don't know if the
>>>> start and end are reasonable values, but I can see some logic as to
>>>> where they come from. The code path that leads to this is via
>>>> __vunmap() and __purge_vmap_area_lazy(). So it is not too difficult
>>>> to see how we end up with values like this.
>>> Either start or end address is sensible but not the combination - both
>>> addresses should be in the same segment.  Start is in XKSEG, end in CKSEG2
>>> and in between there are vast wastelands of unused address space exabytes
>>> in size.
>>>
>>>> But the size calculation above with these types of values will result
>>>> in still a large number. Larger than the 32bit "int" that is "size".
>>>> I see large negative values fall out as size, and so the following
>>>> tlbsize check becomes true, and the code spins inside the loop inside
>>>> that if statement for a _very_ long time trying to flush tlb entries.
>>>>
>>>> This is of course easily fixed, by making that size "unsigned long".
>>>> The patch below trivially does this.
>>>>
>>>> But is this analysis correct?
>>> Yes - but I think we have two issues here.  The one is the calculation
>>> overflowing int for the arguments you're seeing.  The other being that
>>> the arguments simply are looking wrong.
>>
>> The wrong combination comes from lazy vunmapping which was introduced
>> in 2.6.28 cycle.  Maybe we can add new API (non-lazy version of
>> vfree()) to vmalloc.c to implement module_free(), but I suppose
>> fallbacking to local_flush_tlb_all() in local_flush_tlb_kernel_range()
>> is enough().
>
> Is there any performance impact on falling back to that?
>
> The flushing due to lazy vunmapping didn't seem to happen
> often in the tests I was running.

It would depend on the workload.  Some depend heavily on the performance
of vmalloc & co.  What I'm wondering now is if we no tend to always flush
the entire TLB instead of just a few entries.  The real cost of a TLB
flush is often not the flushing but the eventual reload of the entries.
That's factors that are hard to predict so benchmarking would be
interesting.

  Ralf