Hi Will,
On 2024/10/24 21:21, Will Deacon wrote:
On Thu, Oct 17, 2024 at 08:43:43PM +0200, Jann Horn wrote:
+arm64 maintainers in case they have opinions on the break-before-make aspects
Thanks, Jann.
On Thu, Oct 17, 2024 at 11:48 AM Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx> wrote:
+void try_to_free_pte(struct mm_struct *mm, pmd_t *pmd, unsigned long addr,
+ struct mmu_gather *tlb)
+{
+ pmd_t pmdval;
+ spinlock_t *pml, *ptl;
+ pte_t *start_pte, *pte;
+ int i;
+
+ start_pte = pte_offset_map_rw_nolock(mm, pmd, addr, &pmdval, &ptl);
+ if (!start_pte)
+ return;
+
+ pml = pmd_lock(mm, pmd);
+ if (ptl != pml)
+ spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
+
+ if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd))))
+ goto out_ptl;
+
+ /* Check if it is empty PTE page */
+ for (i = 0, pte = start_pte; i < PTRS_PER_PTE; i++, pte++) {
+ if (!pte_none(ptep_get(pte)))
+ goto out_ptl;
+ }
+ pte_unmap(start_pte);
+
+ pmd_clear(pmd);
+
+ if (ptl != pml)
+ spin_unlock(ptl);
+ spin_unlock(pml);
At this point, you have cleared the PMD and dropped the locks
protecting against concurrency, but have not yet done a TLB flush. If
another thread concurrently repopulates the PMD at this point, can we
get incoherent TLB state in a way that violates the arm64
break-before-make rule?
Sounds like it, yes, unless there's something that constrains the new
PMD value to be some function of what it was in the first place?
Thank you for taking a look at this! I have tried to detect this case
and flush TLB in page fault. For details, please refer to this RFC
patch:
https://lore.kernel.org/lkml/20240815120715.14516-1-zhengqi.arch@xxxxxxxxxxxxx/
And more context here:
https://lore.kernel.org/lkml/6f38cb19-9847-4f70-bbe7-06881bb016be@xxxxxxxxxxxxx/
If necessary, I can rebase the RFC patch and resend it.
Thanks!
Will