On 24.10.2018 19:29, John David Anglin wrote: > On 2018-10-24 11:45 AM, Helge Deller wrote: >> On 24.10.2018 17:03, John David Anglin wrote: >>> On 2018-10-24 7:59 AM, John David Anglin wrote: >>>> The fault occured executing this instruction "stw r31,0(r25)". Register r31 contains the following >>>> instruction "pdtlb,l r0(sr1,r3)". This indicates the fault occurred during alternative patching. >>>> >>>> I suspect all kernel TLB entries need to be flushed prior to alternative patching to ensure that kernel >>>> pages are writeable. >>> Looks like this is a problem with set_kernel_text_rw(). Maybe this causes problems: >>> >>> int __flush_tlb_range(unsigned long sid, unsigned long start, >>> unsigned long end) >>> { >>> unsigned long flags; >>> >>> if ((!IS_ENABLED(CONFIG_SMP) || !arch_irqs_disabled()) && >>> end - start >= parisc_tlb_flush_threshold) { >>> flush_tlb_all(); >>> return 1; >>> } >>> >>> I believe that we need to disable this optimization until the parisc_tlb_flush_threshold is >>> calculated. I think this crash is related to the occasional crash in parisc_setup_cache_timing(). >>> >>> Maybe change in cache.c the initial define for parisc_tlb_flush_threshold: >>> static unsigned long parisc_tlb_flush_threshold __read_mostly = ~0UL; >> If it would run into flush_tlb_all(), then I'd expect that all TLBs have been flushed and >> we wouldn't see an issue. >> Maybe the info in the cache_info struct, which is used in the assembly of flush_tlb_all_local(), >> hasn't been initialized yet and such the whole cache hasn't been flushed? > Since the fault occurred before the write bit is removed, it seems to > me that the only way this can happen is that the TLB entry is left > over from a previous instantiation of the OS. Agreed. > parisc_kernel_start() doesn't seem to whack TLB. This suggests that > __flush_tlb_range() call in set_kernel_text_rw() didn't work as > expected. Yes, seems so. This system has only one CPU, so one flush_tlb_all_local() should have been sufficient. > Maybe start or end are wrong (same function pointer issue as os_hpmc)? I don't think _start and _end are wrong. Then the issue would probably be reproducible. Meelis, do you still have the original System.map file (or the vmlinux) so that we could check? Helge