* John David Anglin <dave.anglin@xxxxxxxx>: > On 2021-02-11 4:51 p.m., Guenter Roeck wrote: > > On Thu, Feb 11, 2021 at 09:38:25AM -0500, John David Anglin wrote: > >> On 2021-02-10 8:20 p.m., Guenter Roeck wrote: > >>> On Wed, Feb 10, 2021 at 01:57:42PM -0500, John David Anglin wrote: > >>>> On 2021-02-10 12:23 p.m., Helge Deller wrote: > >>>>> On 2/10/21 3:52 PM, Guenter Roeck wrote: > >>>>>> On Wed, Jan 27, 2021 at 10:18:51PM +0100, Helge Deller wrote: > >>>>>>> On parisc a spinlock is stored in the next page behind the pgd which > >>>>>>> protects against parallel accesses to the pgd. That's why one additional > >>>>>>> page (PGD_ALLOC_ORDER) is allocated for the pgd. > >>>>>>> > >>>>>>> Matthew Wilcox suggested that we instead should use a pointer in the > >>>>>>> struct page table for this spinlock and noted, that the comments for the > >>>>>>> PGD_ORDER and PMD_ORDER defines were wrong. > >>>>>>> > >>>>>>> Both suggestions are addressed in this patch. The pgd spinlock > >>>>>>> (parisc_pgd_lock) is stored in the struct page table. In > >>>>>>> switch_mm_irqs_off() the physical address of this lock is loaded into > >>>>>>> cr28 (tr4) and the pgd into cr25, so that the fault handlers can > >>>>>>> directly access the lock. > >>>>>>> > >>>>>>> The currently implemened Hybrid L2/L3 page table scheme (where the pmd > >>>>>>> is adjacent to the pgd) is dropped now too. > >>>>>>> > >>>>>>> Suggested-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> > >>>>>>> Fixes: b37d1c1898b2 ("parisc: Use per-pagetable spinlock") > >>>>>>> Signed-off-by: Helge Deller <deller@xxxxxx> > >>>>>>> Signed-off-by: John David Anglin <dave.anglin@xxxxxxxx> > >>>>>> This patch results in: > >>>>>> > >>>>>> BUG: spinlock recursion on CPU#0, swapper/0/1 > >>>>>> lock: 0x12226d14, .magic: dead4ead, .owner: swapper/0/1, .owner_cpu: 0 > >>>>>> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc7-next-20210209-32bit #1 > >>>>>> Hardware name: 9000/778/B160L > >>>>>> Backtrace: > >>>>>> [<1019f9bc>] show_stack+0x34/0x48 > >>>>>> [<10a65278>] dump_stack+0x94/0x114 > >>>>>> [<10219f4c>] spin_dump+0x8c/0xb8 > >>>>>> [<1021a0b4>] do_raw_spin_lock+0xdc/0x108 > >>>>>> [<10a7367c>] _raw_spin_lock_irqsave+0x30/0x48 > >>>>>> [<102bf41c>] handle_mm_fault+0x5e8/0xdb0 > >>>>>> [<102b813c>] __get_user_pages.part.0+0x1b0/0x3d4 > >>>>>> [<102b8900>] __get_user_pages_remote+0x134/0x34c > >>>>>> [<102b8b80>] get_user_pages_remote+0x68/0x90 > >>>>>> [<102fccb0>] get_arg_page+0x94/0xd8 > >>>>>> [<102fdd84>] copy_string_kernel+0xc4/0x234 > >>>>>> [<102fe70c>] kernel_execve+0xcc/0x1a4 > >>>>>> [<10a58d94>] run_init_process+0xbc/0xe0 > >>>>>> [<10a70d50>] kernel_init+0x98/0x13c > >>>>>> [<1019a01c>] ret_from_kernel_thread+0x1c/0x24 > >>>>>> > >>>>>> when trying to boot parisc/hppa images in qemu. Reverting this patch fixes > >>>>>> the problem. > >>>>> True, I can reproduce the problem. > >>>>> With CONFIG_DEBUG_SPINLOCK=y you get the backtrace above. > >>>>> With CONFIG_DEBUG_SPINLOCK=n it just hangs. > >>>>> Happenes with 32-bit kernel with SMP kernel, even if only one virtual CPU is started. > >>>> Which is quite puzzling since most spin locks are optimized in this case case. Strikes me we > >>>> have a lock that's not initialized. > >>>> > >>>> It's not obvious how this relates to the patch. get_arg_page() calls get_user_pages_remote() with > >>> The fact that reverting it fixes the problem is a good indication > >>> that the problem does relate to this patch. > >>> > >>> As for how - no idea. That is not my area of expertise. > >> I built Helge's for-next tree both with CONFIG_DEBUG_SPINLOCK=y and CONFIG_DEBUG_SPINLOCK=n. Both > >> builds work fine on c8000. > >> > >> The only thing that might have changed in the patch is the alignment of the lock used for page table updates. > >> Qemu only support PA 1.x instructions. The ldcw instruction needs 16-byte alignment on real hardware and > >> there is code to dynamically align lock pointers to 16-byte alignment. The c8000 supports PA 2.0 instructions > >> and the ldcw,co instruction only needs 4-byte alignment. Perhaps there is an issue with the dynamic alignment > >> of the lock pointer or the lock initialization in the PA 1.x build for qemu. > >> > > The first lock is acquired in mm/memory.c from line 3565: > > > > vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, > > &vmf->ptl); > > > > The second (recursive) lock is acquired from line 3587 in the same > > function: > > > > set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); > > > > Source code lines are from next-20210211. I confirmed with debug code > > that the lock address passed to do_raw_spin_lock() is the same in both > > calls. > Thanks Guenter. I assume this is with v15 of the patch? Yes, happens with latest version too. > It looks as if we might not need the ptl lock in set_pte_at() and probably elsewhere. Just removing the locks from set_pte_at() didn't solved it. After removing the others too, it works. Patch is attached below. I added a memory-barrier to set_pte() too. > But I'm still puzzled as to why this doesn't happen when different locks are used as in your > report with the earlier patch. I think it happened earlier too. Would need to test though. Helge diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h index 2e1873cd4877..2c68010d896a 100644 --- a/arch/parisc/include/asm/pgtable.h +++ b/arch/parisc/include/asm/pgtable.h @@ -82,17 +82,14 @@ static inline void purge_tlb_entries(struct mm_struct *mm, unsigned long addr) */ #define set_pte(pteptr, pteval) \ do{ \ - *(pteptr) = (pteval); \ + *(pteptr) = (pteval); \ + mb(); \ } while(0) #define set_pte_at(mm, addr, ptep, pteval) \ do { \ - unsigned long flags; \ - spinlock_t *pgd_lock = &(mm)->page_table_lock; \ - spin_lock_irqsave(pgd_lock, flags); \ set_pte(ptep, pteval); \ purge_tlb_entries(mm, addr); \ - spin_unlock_irqrestore(pgd_lock, flags); \ } while (0) #endif /* !__ASSEMBLY__ */ @@ -431,22 +428,15 @@ extern void update_mmu_cache(struct vm_area_struct *, unsigned long, pte_t *); static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { pte_t pte; - unsigned long flags; - spinlock_t *pgd_lock; if (!pte_young(*ptep)) return 0; - pgd_lock = &vma->vm_mm->page_table_lock; - spin_lock_irqsave(pgd_lock, flags); pte = *ptep; if (!pte_young(pte)) { - spin_unlock_irqrestore(pgd_lock, flags); return 0; } - set_pte(ptep, pte_mkold(pte)); - purge_tlb_entries(vma->vm_mm, addr); - spin_unlock_irqrestore(pgd_lock, flags); + set_pte_at(vma->vm_mm, addr, ptep, pte_mkold(pte)); return 1; } @@ -454,29 +444,16 @@ struct mm_struct; static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { pte_t old_pte; - unsigned long flags; - spinlock_t *pgd_lock; - pgd_lock = &mm->page_table_lock; - spin_lock_irqsave(pgd_lock, flags); old_pte = *ptep; - set_pte(ptep, __pte(0)); - purge_tlb_entries(mm, addr); - spin_unlock_irqrestore(pgd_lock, flags); + set_pte_at(mm, addr, ptep, __pte(0)); return old_pte; } static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - unsigned long flags; - spinlock_t *pgd_lock; - - pgd_lock = &mm->page_table_lock; - spin_lock_irqsave(pgd_lock, flags); - set_pte(ptep, pte_wrprotect(*ptep)); - purge_tlb_entries(mm, addr); - spin_unlock_irqrestore(pgd_lock, flags); + set_pte_at(mm, addr, ptep, pte_wrprotect(*ptep)); } #define pte_same(A,B) (pte_val(A) == pte_val(B))