Re: support for DEBUG_VM_PGTABLE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2022-03-09 9:56 a.m., Rolf Eike Beer wrote:
Some recent patches made me aware of DEBUG_VM_PGTABLE. Has anyone tried to get
this working on hppa? Given the constant problems with caches and memory it
may help find some subtle bugs in the code.
I haven't tried DEBUG_VM_PGTABLE but I think our cache problems stem from this code in entry.S:

        /*
         * Non access misses can be caused by fdc,fic,pdc,lpa,probe and
         * probei instructions. We don't want to fault for these
         * instructions (not only does it not make sense, it can cause
         * deadlocks, since some flushes are done with the mmap
         * semaphore held). If the translation doesn't exist, we can't
         * insert a translation, so have to emulate the side effects
         * of the instruction. Since we don't insert a translation
         * we can get a lot of faults during a flush loop, so it makes
         * sense to try to do it here with minimum overhead. We only
         * emulate fdc,fic,pdc,probew,prober instructions whose base
         * and index registers are not shadowed. We defer everything
         * else to the "slow" path.
         */

        mfctl           %cr19,%r9 /* Get iir */

        /* PA 2.0 Arch Ref. Book pg 382 has a good description of the insn bits.
           Checks for fdc,fdce,pdc,"fic,4f",prober,probeir,probew, probeiw */

        /* Checks for fdc,fdce,pdc,"fic,4f" only */
        ldi             0x280,%r16
        and             %r9,%r16,%r17
        cmpb,<>,n       %r16,%r17,nadtlb_probe_check
        bb,>=,n         %r9,26,nadtlb_nullify  /* m bit not set, just nullify */
        ...

What the code is doing is nullifying cache flush/purge instructions when we take a non-access data
TLB fault.  When this happens, the cache line is not invalidated.

We get these faults then the _PAGE_PRESENT_BIT is not set in the PTE.  For example, the bit won't be
set for text that hasn't been loaded from disk.

This occurs quite frequently.  For example, with nullification disabled,

Kernel Fault: Code=17 (Non-access DTLB miss fault) at addr 00000000001d7000
CPU: 0 PID: 1 Comm: init Not tainted 5.16.12+ #3
Hardware name: 9000/800/rp3440

     YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001000000010000001111 Not tainted
r00-03  000000000804040f 0000000040cd3000 0000000040208974 000000004b615060
r04-07  0000000040b5e860 00000000001d7000 00000000001dd000 000000004bcd7f78
r08-11  000000004bcd7f88 000000004b614d80 00000000001dd000 000000004bcd7ae8
r12-15  0000000000000000 000000004b614ba0 000000004df79a38 000000004bcd7a40
r16-19  00000000001d7000 000000004bcd7f38 000000004b614bd8 0000000000094c00
r20-23  0000000000000000 0000000000000800 00000000001d8000 0000000000000080
r24-27  00000000001d7000 00000000001dd000 00000000001d7000 0000000040b5e860
r28-31  0000000040b27500 000000004b6150d0 000000004b615100 0000000000094c00
sr00-03  0000000000000000 0000000000000000 0000000000000000 0000000000094c00
sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004020074c 0000000040200750
 IIR: 0757d2a0    ISR: 0000000000094c00  IOR: 00000000001d7000
 CPU:        0   CR30: 000000004b598af0 CR31: ffffffffffffffff
 ORIG_R28: 000000004b843678
 IAOQ[0]: flush_user_dcache_range_asm+0x20/0x78
 IAOQ[1]: flush_user_dcache_range_asm+0x24/0x78
 RP(r2): flush_user_cache_tlb.isra.0+0x5c/0xe0
Backtrace:
 [<0000000040208974>] flush_user_cache_tlb.isra.0+0x5c/0xe0
 [<0000000040209288>] flush_cache_range+0x128/0x148
 [<000000004041c880>] unmap_page_range+0xb8/0xc08
 [<000000004041d438>] unmap_single_vma+0x68/0x130
 [<000000004041d940>] unmap_vmas+0x70/0xb0
 [<0000000040427e20>] unmap_region+0x108/0x1b0
 [<000000004042ab9c>] __do_munmap+0x264/0x5e8
 [<000000004042afd0>] __vm_munmap+0xb0/0x138
 [<000000004042b084>] vm_munmap+0x2c/0x40
 [<0000000040552410>] elf_map+0xd8/0x198
 [<0000000040554c48>] load_elf_binary+0xb40/0x14c0
 [<00000000404a499c>] exec_binprm+0x23c/0x630
 [<00000000404a4fdc>] bprm_execve+0x24c/0x360
 [<00000000404a7468>] kernel_execve+0x1f0/0x2b8
 [<0000000040a990e4>] run_init_process+0x164/0x198
 [<0000000040ab4664>] kernel_init+0x184/0x340
 [<0000000040202020>] ret_from_kernel_thread+0x20/0x28

flush_user_cache_tlb() uses flush_user_dcache_range_asm() and flush_user_icache_range_asm().
These flush routines use the normal PTE entries setup to control user access.

I believe the easiest fix is to use the tmp alias flush routines (flush_cache_pages).  They set up a
special PTE for the flush.  As long as we have a PTE, it corresponds to a physical page.  The cache
lines can be invalidated even when data hasn't been loaded from storage.

Dave

--
John David Anglin  dave.anglin@xxxxxxxx




[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux