On Wed, Oct 26, 2022 at 02:06:00PM +0530, Anshuman Khandual wrote: > On 10/26/22 12:31, Muchun Song wrote: > >> On 10/25/22 12:06, Muchun Song wrote: > >>>> On Oct 25, 2022, at 09:42, Wupeng Ma <mawupeng1@xxxxxxxxxx> wrote: > >>>> From: Ma Wupeng <mawupeng1@xxxxxxxxxx> > >>>> > >>>> Commit f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with > >>>> each HugeTLB page") add vmemmap_remap_pte to remap the tail pages as > >>>> read-only to catch illegal write operation to the tail page. > >>>> > >>>> However this will lead to WARN_ON in arm64 in __check_racy_pte_update() > >>> > >>> Thanks for your finding this issue. > >>> > >>>> since this may lead to dirty state cleaned. This check is introduced by > >>>> commit 2f4b829c625e ("arm64: Add support for hardware updates of the > >>>> access and dirty pte bits") and the initial check is as follow: > >>>> > >>>> BUG_ON(pte_write(*ptep) && !pte_dirty(pte)); > >>>> > >>>> Since we do need to mark this pte as read-only to catch illegal write > >>>> operation to the tail pages, use set_pte to replace set_pte_at to bypass > >>>> this check. > >>> > >>> In theory, the waring does not affect anything since the tail vmemmap > >>> pages are supposed to be read-only. So, skipping this check for vmemmap > >> > >> Tails vmemmap pages are supposed to be read-only, in practice but their > >> backing pages do have pte_write() enabled. Otherwise the VM_WARN_ONCE() > >> warning would not have triggered. > > > > Right. > > > >> > >> VM_WARN_ONCE(pte_write(old_pte) && !pte_dirty(pte), > >> "%s: racy dirty state clearing: 0x%016llx -> 0x%016llx", > >> __func__, pte_val(old_pte), pte_val(pte)); > >> > >> Also, is not it true that the pte being remapped into a different page > >> as read only, than what it had originally (which will be freed up) i.e > >> the PFN in 'old_pte' and 'pte' will be different. Hence is there still > > > > Right. > > > >> a possibility for a race condition even when the PFN changes ? > > > > Sorry, I didn't get this question. Did you mean the PTE is changed from > > new (pte) to the old one (old_pte) by the hardware because of the update > > of dirty bit when a concurrent write operation to the tail vmemmap page? > > No, but is not vmemmap_remap_pte() reuses walk->reuse_page for all remaining > tails pages ? Is not there a PFN change, along with access permission change > involved in this remapping process ? For the record, as we discussed offline, changing the output address (pfn) of a pte is not safe without break-before-make if at least one of the mappings was writeable. The caller (vmemmap_remap_pte()) would need to be fixed to first invalidate the pte and then write the new pte. I assume no other CPU accesses this part of the vmemmap while the pte is being remapped. -- Catalin