> On Jun 11, 2019, at 5:24 AM, Thomas Hellström (VMware) <thellstrom@xxxxxxxxxxxxxxxxx> wrote: > > From: Thomas Hellstrom <thellstrom@xxxxxxxxxx> > [ snip ] > +/** > + * apply_pt_wrprotect - Leaf pte callback to write-protect a pte > + * @pte: Pointer to the pte > + * @token: Page table token, see apply_to_pfn_range() > + * @addr: The virtual page address > + * @closure: Pointer to a struct pfn_range_apply embedded in a > + * struct apply_as > + * > + * The function write-protects a pte and records the range in > + * virtual address space of touched ptes for efficient range TLB flushes. > + * > + * Return: Always zero. > + */ > +static int apply_pt_wrprotect(pte_t *pte, pgtable_t token, > + unsigned long addr, > + struct pfn_range_apply *closure) > +{ > + struct apply_as *aas = container_of(closure, typeof(*aas), base); > + pte_t ptent = *pte; > + > + if (pte_write(ptent)) { > + pte_t old_pte = ptep_modify_prot_start(aas->vma, addr, pte); > + > + ptent = pte_wrprotect(old_pte); > + ptep_modify_prot_commit(aas->vma, addr, pte, old_pte, ptent); > + aas->total++; > + aas->start = min(aas->start, addr); > + aas->end = max(aas->end, addr + PAGE_SIZE); > + } > + > + return 0; > +} > + > +/** > + * struct apply_as_clean - Closure structure for apply_as_clean > + * @base: struct apply_as we derive from > + * @bitmap_pgoff: Address_space Page offset of the first bit in @bitmap > + * @bitmap: Bitmap with one bit for each page offset in the address_space range > + * covered. > + * @start: Address_space page offset of first modified pte relative > + * to @bitmap_pgoff > + * @end: Address_space page offset of last modified pte relative > + * to @bitmap_pgoff > + */ > +struct apply_as_clean { > + struct apply_as base; > + pgoff_t bitmap_pgoff; > + unsigned long *bitmap; > + pgoff_t start; > + pgoff_t end; > +}; > + > +/** > + * apply_pt_clean - Leaf pte callback to clean a pte > + * @pte: Pointer to the pte > + * @token: Page table token, see apply_to_pfn_range() > + * @addr: The virtual page address > + * @closure: Pointer to a struct pfn_range_apply embedded in a > + * struct apply_as_clean > + * > + * The function cleans a pte and records the range in > + * virtual address space of touched ptes for efficient TLB flushes. > + * It also records dirty ptes in a bitmap representing page offsets > + * in the address_space, as well as the first and last of the bits > + * touched. > + * > + * Return: Always zero. > + */ > +static int apply_pt_clean(pte_t *pte, pgtable_t token, > + unsigned long addr, > + struct pfn_range_apply *closure) > +{ > + struct apply_as *aas = container_of(closure, typeof(*aas), base); > + struct apply_as_clean *clean = container_of(aas, typeof(*clean), base); > + pte_t ptent = *pte; > + > + if (pte_dirty(ptent)) { > + pgoff_t pgoff = ((addr - aas->vma->vm_start) >> PAGE_SHIFT) + > + aas->vma->vm_pgoff - clean->bitmap_pgoff; > + pte_t old_pte = ptep_modify_prot_start(aas->vma, addr, pte); > + > + ptent = pte_mkclean(old_pte); > + ptep_modify_prot_commit(aas->vma, addr, pte, old_pte, ptent); > + > + aas->total++; > + aas->start = min(aas->start, addr); > + aas->end = max(aas->end, addr + PAGE_SIZE); > + > + __set_bit(pgoff, clean->bitmap); > + clean->start = min(clean->start, pgoff); > + clean->end = max(clean->end, pgoff + 1); > + } > + > + return 0; Usually, when a PTE is write-protected, or when a dirty-bit is cleared, the TLB flush must be done while the page-table lock for that specific table is taken (i.e., within apply_pt_clean() and apply_pt_wrprotect() in this case). Otherwise, in the case of apply_pt_clean() for example, another core might shortly after (before the TLB flush) write to the same page whose PTE was changed. The dirty-bit in such case might not be set, and the change get lost. Does this function regards a certain use-case in which deferring the TLB flushes is fine? If so, assertions and documentation of the related assumption would be useful.