Re: [parisc] A500 boot crash with 44786880df196a4200c178945c4d41675faf9fb7

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2018-10-24 3:34 PM, Helge Deller wrote:
On 24.10.2018 19:29, John David Anglin wrote:
On 2018-10-24 11:45 AM, Helge Deller wrote:
On 24.10.2018 17:03, John David Anglin wrote:
On 2018-10-24 7:59 AM, John David Anglin wrote:
The fault occured executing this instruction "stw r31,0(r25)". Register r31 contains the following
instruction "pdtlb,l r0(sr1,r3)".  This indicates the fault occurred during alternative patching.

I suspect all kernel TLB entries need to be flushed prior to alternative patching to ensure that kernel
pages are writeable.
Looks like this is a problem with set_kernel_text_rw().  Maybe this causes problems:

int __flush_tlb_range(unsigned long sid, unsigned long start,
                        unsigned long end)
{
          unsigned long flags;

          if ((!IS_ENABLED(CONFIG_SMP) || !arch_irqs_disabled()) &&
              end - start >= parisc_tlb_flush_threshold) {
                  flush_tlb_all();
                  return 1;
          }

I believe that we need to disable this optimization until the parisc_tlb_flush_threshold is
calculated.  I think this crash is related to the occasional crash in parisc_setup_cache_timing().

Maybe change in cache.c the initial define for parisc_tlb_flush_threshold:
static unsigned long parisc_tlb_flush_threshold __read_mostly = ~0UL;
If it would run into flush_tlb_all(), then I'd expect that all TLBs have been flushed and
we wouldn't see an issue.
Maybe the info in the cache_info struct, which is used in the assembly of flush_tlb_all_local(),
hasn't been initialized yet and such the whole cache hasn't been flushed?
Since the fault occurred before the write bit is removed, it seems to
me that the only way this can happen is that the TLB entry is left
over from a previous instantiation of the OS.
Agreed.

parisc_kernel_start() doesn't seem to whack TLB.  This suggests that
__flush_tlb_range() call in set_kernel_text_rw() didn't work as
expected.
Yes, seems so.
This system has only one CPU, so one flush_tlb_all_local() should have been sufficient.
Yes, I believe only one CPU is up at this point.  Maybe this should be tried instead of the range flush.
Should be faster.

I need to fix the initial value of parisc_tlb_flush_threshold so that the optimization is disabled when we
compute the timing for the range flush.
  > Maybe start or end are wrong (same function pointer issue as os_hpmc)?
I don't think _start and _end are wrong. Then the issue would probably be reproducible.
Meelis, do you still have the original System.map file (or the vmlinux) so that we could check?

Dave

--
John David Anglin  dave.anglin@xxxxxxxx




[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux