Re: Recurring INEQUIVALENT ALIASES issues and userland corruption/crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2022-03-22 1:52 p.m., Sam James wrote:
In Gentoo, we've just got our hands on an RP3440 (PA8800) which seems to quite easily hit inequivalent aliasing issues.

We've found that under some workloads, the machine copes fine, none of that appears in dmesg, and all is well - even for
over a week. But as soon as we start other workloads (the problematic one is building "stages" -- release media for Gentoo),
within 30m or so, the machine is in a broken state, with these messages flooding dmesg:
```
Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x42994000 and 0x426e1000 in file bash
Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x426e1000 and 0x41b56000 in file bash
Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41b56000 and 0x41aae000 in file bash
Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41aae000 and 0x42774000 in file bash
Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x42774000 and 0x41202000 in file bash
Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41202000 and 0x428dd000 in file bash
Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41e2c000 and 0x418f6000 in file bash
Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x418f6000 and 0x42980000 in file bash
Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x42980000 and 0x426cd000 in file bash
Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x426cd000 and 0x41b42000 in file bash
Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41b42000 and 0x41a9a000 in file bash
Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41a9a000 and 0x42760000 in file bash
Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x42760000 and 0x411ee000 in file bash
Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x411ee000 and 0x428c9000 in file bash
```
It seems all these messages result from a single call to flush_dcache_page.  Note the sequential behavior of old_addr
and addr, and message times.

Possibly, the VMA interval tree is corrupt, so the loop doesn't terminate properly.

        vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
                offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
                addr = mpnt->vm_start + offset;

                /* The TLB is the engine of coherence on parisc: The
                 * CPU is entitled to speculate any page with a TLB
                 * mapping, so here we kill the mapping then flush the
                 * page along a special flush only alias mapping.
                 * This guarantees that the page is no-longer in the
                 * cache for any process and nor may it be
                 * speculatively read in (until the user or kernel
                 * specifically accesses it, of course) */

                flush_tlb_page(mpnt, addr);
                if (old_addr == 0 || (old_addr & (SHM_COLOUR - 1))
                                      != (addr & (SHM_COLOUR - 1))) {
                        __flush_cache_page(mpnt, addr, page_to_phys(page));
                        if (parisc_requires_coherency() && old_addr)
                                printk(KERN_ERR "INEQUIVALENT ALIASES 0x%lx and 0x%lx in file %pD\n", old_addr, addr, mpnt->vm_file);
                        old_addr = addr;
                }
        }

I see arm skips some VMAs:

        vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
                unsigned long offset;

                /*
                 * If this VMA is not in our MM, we can ignore it.
                 */
                if (mpnt->vm_mm != mm)
                        continue;
                if (!(mpnt->vm_flags & VM_MAYSHARE))
                        continue;
                offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
                flush_cache_page(mpnt, mpnt->vm_start + offset, page_to_pfn(page));
        }

Dave

--
John David Anglin  dave.anglin@xxxxxxxx




[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux