Re: [PATCH] parisc: Fix TLB related boot crash on SMP machines

Helge Deller <deller@xxxxxx> · Thu, 8 Dec 2016 22:15:09 +0100

On 08.12.2016 21:49, John David Anglin wrote:
> On 2016-12-08 3:00 PM, Helge Deller wrote:
>> On a technical side, this seems to happen:
>> The TLB measurement code uses flush_tlb_kernel_range() to flush specific TLB
>> entries with a page size of 4k (pdtlb 0(sr1,addr)). On UP systems this purge
>> instruction seems to work without problems even if the pages were mapped as
>> huge pages.  But on SMP systems the TLB purge instruction is broadcasted to
>> other CPUs. Those CPUs then crash the machine because the page size is not as
>> expected.  C8000 machines with PA8800/PA8900 CPUs were not affected by this
>> problem, because the required cache coherency prohibits to use huge pages at
>> all.  Sadly I didn't found any documentation about this behaviour, so this
>> finding is purely based on testing with phyiscal SMP machines (A500-44 and
>> J5000, both were 2-way boxes).

> I doubt the problem is the 4k iteration using pdtlb 0(sr1,addr). I
> think the issue is the huge page size for the kernel. Each pdtlb
> instruction knocks out the same tlb entry including the entry used
> for tlb interruptions. This likely leads to stack overflow. 

Yes, likely.

> In any
> case, it probably doesn't provide accurate timing because each pdtlb
> knocks out the entry for the interruption handler on systems with
> combined tlb.

True.

So, how to continue?
I see two options:
a) skip the TLB measuring code as my patch does.
b) kmalloc() another region and do measurement there.

I'd like to submit some fix-patch for 4.9, else the machines won't boot 4.9.
That's why I'd prefer option a).
Opinions?

Helge
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html