On Thu, Apr 29, 2021 at 08:04:17PM +0200, Thomas Bogendoerfer wrote: > On Mon, Apr 26, 2021 at 09:11:47PM +0800, Huang Pei wrote: > > [ 40.873779] Cause : 4080800c (ExcCode 03) > > [ 40.877775] BadVA : 77e23000 > > [ 40.880645] PrId : 00d00100 (Ingenic XBurst) > > [ 40.884989] Modules linked in: > > [ 40.888034] Process Xsession (pid: 1370, threadinfo=ca5ce8d6, > > task=61c8f3ad, tls=77e28690) > > > > !!! This is my first time debug MIPS32 kernel, I think both threadinfo > > and task should be at KSEG0, instead of KSEG2 or USEG > > don't print pointers with %p they will be garbled for security reasons. > > see Documentation/core-api/printk-formats.rst > > "A raw pointer value may be printed with %p which will hash the address > before printing." > > > [ 41.233877] Index: 27 pgmask=4kb va=77e22000 asid=5c > > [ 41.233877] [pa=06a17000 c=3 d=0 v=0 g=0] [pa=2017a000 c=0 d=0 v=0 > > g=0] > > > > !!! TLB entry is loading a SWAP entry(C=0, pa=swap) at BADV, the pte_buddy point to a valid > > PFN(C=3, pa seem ok), but it is impossible, since line 116 must flush the tlb and > > replaced swap entry with new page > > My mistake, line 116(aka set_pte_at) flush ONLY cache (__update_cache) instead of tlb, so it is ok that the we saw it since this is how page swap in began with. the CP0 EPC indicate a "cache 0x15 0x77e23000" trigger it, with my patch applyed, the PTE is set young, > > Am I missing something? > > not sure, if I'm on the right track, but with the _PAGE_VALID bit set in > the new pte local_r4k_flush_cache_page() will do > > if ((mm == current->active_mm) && (pte_val(*ptep) & _PAGE_VALID)) > vaddr = NULL; > without my patch, go through the "else" branch with kmap* ------------- else { /* * Use kmap_coherent or kmap_atomic to do flushes for * another ASID than the current one. */ map_coherent = (cpu_has_dc_aliases && page_mapcount(page) && !Page_dcache_dirty(page)); if (map_coherent) vaddr = kmap_coherent(page, addr); else vaddr = kmap_atomic(page); addr = (unsigned long)vaddr; } -------------- Obviously, flush_cache_page with kmaped address would not cause TLB Invalid Exception, while fulsh_cache_page with original address would cause TLB invalid exception since the new PTE with V|Y set does not take effect in TLB In short, the TLB is stale with SWAP entry and _PAGE_VALID clear. +. with my patch, then flush_cache_page believe the _PAGE_VALID set in TLB because it detect the PTE is young, so trigger the TLB Store Invalid exception. +. without my patch, the flush_cache_page knew the PTE is old, so use the kmapped address to flush cache, which is in accord with the page copy before. my fix to this is flush tlb, any idea? > > and then > > vaddr ? r4k_blast_dcache_page(addr) : > r4k_blast_dcache_user_page(addr); > > flush the address directly instead with r4k_blast_dcache_user_page(). > But the TLB will be updated after the cache flush (which is correct IMHO). > So setting the VALID bit too early destroys the logic of flushing > cache aliases. Do you agree ? > > Thomas. > > -- > Crap can work. Given enough thrust pigs will fly, but it's not necessarily a > good idea. [ RFC1925, 2.3 ]