* John David Anglin <dave.anglin@xxxxxxxx>: > On 2018-10-24 3:34 PM, Helge Deller wrote: > > On 24.10.2018 19:29, John David Anglin wrote: > > > On 2018-10-24 11:45 AM, Helge Deller wrote: > > > > On 24.10.2018 17:03, John David Anglin wrote: > > > > > On 2018-10-24 7:59 AM, John David Anglin wrote: > > > > > > The fault occured executing this instruction "stw r31,0(r25)". Register r31 contains the following > > > > > > instruction "pdtlb,l r0(sr1,r3)". This indicates the fault occurred during alternative patching. > > > > > > > > > > > > I suspect all kernel TLB entries need to be flushed prior to alternative patching to ensure that kernel > > > > > > pages are writeable. > > > > > Looks like this is a problem with set_kernel_text_rw(). Maybe this causes problems: > > > > > > > > > > int __flush_tlb_range(unsigned long sid, unsigned long start, > > > > > unsigned long end) > > > > > { > > > > > unsigned long flags; > > > > > > > > > > if ((!IS_ENABLED(CONFIG_SMP) || !arch_irqs_disabled()) && > > > > > end - start >= parisc_tlb_flush_threshold) { > > > > > flush_tlb_all(); > > > > > return 1; > > > > > } > > > > > > > > > > I believe that we need to disable this optimization until the parisc_tlb_flush_threshold is > > > > > calculated. I think this crash is related to the occasional crash in parisc_setup_cache_timing(). > > > > > > > > > > Maybe change in cache.c the initial define for parisc_tlb_flush_threshold: > > > > > static unsigned long parisc_tlb_flush_threshold __read_mostly = ~0UL; > > > > If it would run into flush_tlb_all(), then I'd expect that all TLBs have been flushed and > > > > we wouldn't see an issue. > > > > Maybe the info in the cache_info struct, which is used in the assembly of flush_tlb_all_local(), > > > > hasn't been initialized yet and such the whole cache hasn't been flushed? > > > Since the fault occurred before the write bit is removed, it seems to > > > me that the only way this can happen is that the TLB entry is left > > > over from a previous instantiation of the OS. > > Agreed. > > > > > parisc_kernel_start() doesn't seem to whack TLB. This suggests that > > > __flush_tlb_range() call in set_kernel_text_rw() didn't work as > > > expected. > > Yes, seems so. > > This system has only one CPU, so one flush_tlb_all_local() should have been sufficient. > Yes, I believe only one CPU is up at this point. On a SMP system, more may be up, but in a waiting loop... > Maybe this should be tried instead of the range flush. Should be >faster. Here are some ideas: I really think we need to include the __init text section too. Built-in device divers may use asm_io_fdc() or asm_io_sync() in their init code. Should we flush the d-cache before mapping r/o again? As discussed abobe, maybe we should simply use flush_tlb_all_local()? It's on the current CPU which will do the patching and needs those new tlb entries. All of the above doesn't explain why Meelis had a crash at boot though... Signed-off-by: Helge Deller <deller@xxxxxx> diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c index e7e626bcd0be..ddf5c6231fb0 100644 --- a/arch/parisc/mm/init.c +++ b/arch/parisc/mm/init.c @@ -513,14 +513,18 @@ static void __init map_pages(unsigned long start_vaddr, void __init set_kernel_text_rw(int enable_read_write) { - unsigned long start = (unsigned long)_stext; + unsigned long start = (unsigned long)__init_begin; unsigned long end = (unsigned long)_etext; + /* flush modified instructions before mapping ro */ + if (!enable_read_write) + flush_kernel_dcache_range_asm(start, end); + map_pages(start, __pa(start), end-start, PAGE_KERNEL_RWX, enable_read_write ? 1:0); /* force the kernel to see the new TLB entries */ - __flush_tlb_range(0, start, end); + flush_tlb_all_local(NULL); /* dump old cached instructions */ flush_icache_range(start, end);