Re: [PATCH] parisc: Optimize per-pagetable spinlocks (v11)

Helge Deller <deller@xxxxxx> · Fri, 12 Feb 2021 00:22:42 +0100

On 2/12/21 12:14 AM, Helge Deller wrote:
* John David Anglin <dave.anglin@xxxxxxxx>:
On 2021-02-11 4:51 p.m., Guenter Roeck wrote:
On Thu, Feb 11, 2021 at 09:38:25AM -0500, John David Anglin wrote:
On 2021-02-10 8:20 p.m., Guenter Roeck wrote:
On Wed, Feb 10, 2021 at 01:57:42PM -0500, John David Anglin wrote:
On 2021-02-10 12:23 p.m., Helge Deller wrote:
On 2/10/21 3:52 PM, Guenter Roeck wrote:
On Wed, Jan 27, 2021 at 10:18:51PM +0100, Helge Deller wrote:
On parisc a spinlock is stored in the next page behind the pgd which
protects against parallel accesses to the pgd. That's why one additional
page (PGD_ALLOC_ORDER) is allocated for the pgd.

Matthew Wilcox suggested that we instead should use a pointer in the
struct page table for this spinlock and noted, that the comments for the
PGD_ORDER and PMD_ORDER defines were wrong.

Both suggestions are addressed in this patch. The pgd spinlock
(parisc_pgd_lock) is stored in the struct page table. In
switch_mm_irqs_off() the physical address of this lock is loaded into
cr28 (tr4) and the pgd into cr25, so that the fault handlers can
directly access the lock.

The currently implemened Hybrid L2/L3 page table scheme (where the pmd
is adjacent to the pgd) is dropped now too.

Suggested-by: Matthew Wilcox <willy@xxxxxxxxxxxxx>
Fixes: b37d1c1898b2 ("parisc: Use per-pagetable spinlock")
Signed-off-by: Helge Deller <deller@xxxxxx>
Signed-off-by: John David Anglin <dave.anglin@xxxxxxxx>
This patch results in:

BUG: spinlock recursion on CPU#0, swapper/0/1
   lock: 0x12226d14, .magic: dead4ead, .owner: swapper/0/1, .owner_cpu: 0
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc7-next-20210209-32bit #1
Hardware name: 9000/778/B160L
Backtrace:
   [<1019f9bc>] show_stack+0x34/0x48
   [<10a65278>] dump_stack+0x94/0x114
   [<10219f4c>] spin_dump+0x8c/0xb8
   [<1021a0b4>] do_raw_spin_lock+0xdc/0x108
   [<10a7367c>] _raw_spin_lock_irqsave+0x30/0x48
   [<102bf41c>] handle_mm_fault+0x5e8/0xdb0
   [<102b813c>] __get_user_pages.part.0+0x1b0/0x3d4
   [<102b8900>] __get_user_pages_remote+0x134/0x34c
   [<102b8b80>] get_user_pages_remote+0x68/0x90
   [<102fccb0>] get_arg_page+0x94/0xd8
   [<102fdd84>] copy_string_kernel+0xc4/0x234
   [<102fe70c>] kernel_execve+0xcc/0x1a4
   [<10a58d94>] run_init_process+0xbc/0xe0
   [<10a70d50>] kernel_init+0x98/0x13c
   [<1019a01c>] ret_from_kernel_thread+0x1c/0x24

when trying to boot parisc/hppa images in qemu. Reverting this patch fixes
the problem.
True, I can reproduce the problem.
With CONFIG_DEBUG_SPINLOCK=y you get the backtrace above.
With  CONFIG_DEBUG_SPINLOCK=n it just hangs.
Happenes with 32-bit kernel with SMP kernel, even if only one virtual CPU is started.
Which is quite puzzling since most spin locks are optimized in this case case.  Strikes me we
have a lock that's not initialized.

It's not obvious how this relates to the patch.  get_arg_page() calls get_user_pages_remote() with
The fact that reverting it fixes the problem is a good indication
that the problem does relate to this patch.

As for how - no idea. That is not my area of expertise.
I built Helge's for-next tree both with CONFIG_DEBUG_SPINLOCK=y and CONFIG_DEBUG_SPINLOCK=n.  Both
builds work fine on c8000.

The only thing that might have changed in the patch is the alignment of the lock used for page table updates.
Qemu only support PA 1.x instructions.  The ldcw instruction needs 16-byte alignment on real hardware and
there is code to dynamically align lock pointers to 16-byte alignment.  The c8000 supports PA 2.0 instructions
and the ldcw,co instruction only needs 4-byte alignment.  Perhaps there is an issue with the dynamic alignment
of the lock pointer or the lock initialization in the PA 1.x build for qemu.

The first lock is acquired in mm/memory.c from line 3565:

         vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address,
                         &vmf->ptl);

The second (recursive) lock is acquired from line 3587 in the same
function:

         set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);

Source code lines are from next-20210211. I confirmed with debug code
that the lock address passed to do_raw_spin_lock() is the same in both
calls.
Thanks Guenter.  I assume this is with v15 of the patch?

Yes, happens with latest version too.

It looks as if we might not need the ptl lock in set_pte_at() and probably elsewhere.

Just removing the locks from set_pte_at() didn't solved it.
After removing the others too, it works.
Patch is attached below.
I added a memory-barrier to set_pte() too.

But I'm still puzzled as to why this doesn't happen when different locks are used as in your
report with the earlier patch.

I think it happened earlier too. Would need to test though.

Helge

diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h
index 2e1873cd4877..2c68010d896a 100644
--- a/arch/parisc/include/asm/pgtable.h
+++ b/arch/parisc/include/asm/pgtable.h
@@ -82,17 +82,14 @@ static inline void purge_tlb_entries(struct mm_struct *mm, unsigned long addr)
   */
  #define set_pte(pteptr, pteval)                                 \
          do{                                                     \
-                *(pteptr) = (pteval);                           \
+		*(pteptr) = (pteval);				\
+		mb();						\

Actually, this should be barrier() instead of mb().
mb() is overkill on SMP.

Helge



          } while(0)

  #define set_pte_at(mm, addr, ptep, pteval)			\
  	do {							\
-		unsigned long flags;				\
-		spinlock_t *pgd_lock = &(mm)->page_table_lock;	\
-		spin_lock_irqsave(pgd_lock, flags);		\
  		set_pte(ptep, pteval);				\
  		purge_tlb_entries(mm, addr);			\
-		spin_unlock_irqrestore(pgd_lock, flags);	\
  	} while (0)

  #endif /* !__ASSEMBLY__ */
@@ -431,22 +428,15 @@ extern void update_mmu_cache(struct vm_area_struct *, unsigned long, pte_t *);
  static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep)
  {
  	pte_t pte;
-	unsigned long flags;
-	spinlock_t *pgd_lock;

  	if (!pte_young(*ptep))
  		return 0;

-	pgd_lock = &vma->vm_mm->page_table_lock;
-	spin_lock_irqsave(pgd_lock, flags);
  	pte = *ptep;
  	if (!pte_young(pte)) {
-		spin_unlock_irqrestore(pgd_lock, flags);
  		return 0;
  	}
-	set_pte(ptep, pte_mkold(pte));
-	purge_tlb_entries(vma->vm_mm, addr);
-	spin_unlock_irqrestore(pgd_lock, flags);
+	set_pte_at(vma->vm_mm, addr, ptep, pte_mkold(pte));
  	return 1;
  }

@@ -454,29 +444,16 @@ struct mm_struct;
  static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
  {
  	pte_t old_pte;
-	unsigned long flags;
-	spinlock_t *pgd_lock;

-	pgd_lock = &mm->page_table_lock;
-	spin_lock_irqsave(pgd_lock, flags);
  	old_pte = *ptep;
-	set_pte(ptep, __pte(0));
-	purge_tlb_entries(mm, addr);
-	spin_unlock_irqrestore(pgd_lock, flags);
+	set_pte_at(mm, addr, ptep, __pte(0));

  	return old_pte;
  }

  static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
  {
-	unsigned long flags;
-	spinlock_t *pgd_lock;
-
-	pgd_lock = &mm->page_table_lock;
-	spin_lock_irqsave(pgd_lock, flags);
-	set_pte(ptep, pte_wrprotect(*ptep));
-	purge_tlb_entries(mm, addr);
-	spin_unlock_irqrestore(pgd_lock, flags);
+	set_pte_at(mm, addr, ptep, pte_wrprotect(*ptep));
  }

  #define pte_same(A,B)	(pte_val(A) == pte_val(B))