Re: [parisc] A500 boot crash with 44786880df196a4200c178945c4d41675faf9fb7

Helge Deller <deller@xxxxxx> · Thu, 25 Oct 2018 08:28:50 +0200

* John David Anglin <dave.anglin@xxxxxxxx>:
> On 2018-10-24 3:34 PM, Helge Deller wrote:
> > On 24.10.2018 19:29, John David Anglin wrote:
> > > On 2018-10-24 11:45 AM, Helge Deller wrote:
> > > > On 24.10.2018 17:03, John David Anglin wrote:
> > > > > On 2018-10-24 7:59 AM, John David Anglin wrote:
> > > > > > The fault occured executing this instruction "stw r31,0(r25)". Register r31 contains the following
> > > > > > instruction "pdtlb,l r0(sr1,r3)".  This indicates the fault occurred during alternative patching.
> > > > > > 
> > > > > > I suspect all kernel TLB entries need to be flushed prior to alternative patching to ensure that kernel
> > > > > > pages are writeable.
> > > > > Looks like this is a problem with set_kernel_text_rw().  Maybe this causes problems:
> > > > > 
> > > > > int __flush_tlb_range(unsigned long sid, unsigned long start,
> > > > >                         unsigned long end)
> > > > > {
> > > > >           unsigned long flags;
> > > > > 
> > > > >           if ((!IS_ENABLED(CONFIG_SMP) || !arch_irqs_disabled()) &&
> > > > >               end - start >= parisc_tlb_flush_threshold) {
> > > > >                   flush_tlb_all();
> > > > >                   return 1;
> > > > >           }
> > > > > 
> > > > > I believe that we need to disable this optimization until the parisc_tlb_flush_threshold is
> > > > > calculated.  I think this crash is related to the occasional crash in parisc_setup_cache_timing().
> > > > > 
> > > > > Maybe change in cache.c the initial define for parisc_tlb_flush_threshold:
> > > > > static unsigned long parisc_tlb_flush_threshold __read_mostly = ~0UL;
> > > > If it would run into flush_tlb_all(), then I'd expect that all TLBs have been flushed and
> > > > we wouldn't see an issue.
> > > > Maybe the info in the cache_info struct, which is used in the assembly of flush_tlb_all_local(),
> > > > hasn't been initialized yet and such the whole cache hasn't been flushed?
> > > Since the fault occurred before the write bit is removed, it seems to
> > > me that the only way this can happen is that the TLB entry is left
> > > over from a previous instantiation of the OS.
> > Agreed.
> > 
> > > parisc_kernel_start() doesn't seem to whack TLB.  This suggests that
> > > __flush_tlb_range() call in set_kernel_text_rw() didn't work as
> > > expected.
> > Yes, seems so.
> > This system has only one CPU, so one flush_tlb_all_local() should have been sufficient.
> Yes, I believe only one CPU is up at this point.

On a SMP system, more may be up, but in a waiting loop...

>  Maybe this should be tried instead of the range flush.  Should be
>faster.

Here are some ideas:

I really think we need to include the __init text section too.  Built-in
device divers may use asm_io_fdc() or asm_io_sync() in their init code.

Should we flush the d-cache before mapping r/o again?

As discussed abobe, maybe we should simply use flush_tlb_all_local()?
It's on the current CPU which will do the patching and needs those new
tlb entries.

All of the above doesn't explain why Meelis had a crash at boot though...

Signed-off-by: Helge Deller <deller@xxxxxx>

diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
index e7e626bcd0be..ddf5c6231fb0 100644
--- a/arch/parisc/mm/init.c
+++ b/arch/parisc/mm/init.c
@@ -513,14 +513,18 @@ static void __init map_pages(unsigned long start_vaddr,
 
 void __init set_kernel_text_rw(int enable_read_write)
 {
-	unsigned long start = (unsigned long)_stext;
+	unsigned long start = (unsigned long)__init_begin;
 	unsigned long end   = (unsigned long)_etext;
 
+	/* flush modified instructions before mapping ro */
+	if (!enable_read_write)
+		flush_kernel_dcache_range_asm(start, end);
+
 	map_pages(start, __pa(start), end-start,
 		PAGE_KERNEL_RWX, enable_read_write ? 1:0);
 
 	/* force the kernel to see the new TLB entries */
-	__flush_tlb_range(0, start, end);
+	flush_tlb_all_local(NULL);
 
 	/* dump old cached instructions */
 	flush_icache_range(start, end);