On 3/1/23 8:19 PM, Peter Xu wrote: > On Wed, Mar 01, 2023 at 12:55:51PM +0500, Muhammad Usama Anjum wrote: >> Hi Peter, > > Hi, Muhammad, > >> While using WP_UNPOPULATED, we get stuck if newly allocated memory is read >> without initialization. This can be reproduced by either of the following >> statements: >> printf("%c", buffer[0]); >> buffer[0]++; >> >> This bug has start to appear on this patch. How are you handling reading >> newly allocated memory when WP_UNPOPULATED is defined? > > Yes it's a bug, thanks for the reproducer. You're right I missed a trivial > but important detail. Could you try apply below on top? > > ---8<--- > diff --git a/mm/memory.c b/mm/memory.c > index 46934133bd0b..2f4b3892948b 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4062,7 +4062,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) > vma->vm_page_prot)); > vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, > vmf->address, &vmf->ptl); > - if (!pte_none(*vmf->pte)) { > + if (vmf_pte_changed(vmf)) { > update_mmu_tlb(vma, vmf->address, vmf->pte); > goto unlock; > } > ---8<--- This patch works. Thank you so much! > > I can send a new version after you confirmed it at least works on your > side. I'll also add some more test to cover that in the next version. > > The current smoke test within this patch is really light; I somehow rely on > you on this patch on the testing side, and thanks for that. > >> Running my pagemap_ioctl selftest as benchmark in a VM: >> without zeropage / wp_unpopulated (decide from pte_none() if page is dirty >> or not, buggy and wrong implementation, just for reference) >> 26.608 seconds >> with zeropage >> 39.203 seconds >> with wp_unpopulated >> 62.907 seconds >> >> 136% worse performance overall >> 60% worse performance of unpopulated than zeropage > > Yes this is unfortunate, because we're protecting more things than before > when with WP_ZEROPAGE / WP_UNPOPULATED but that's what it is for (when we > want to make sure that accuracy on the holes). > > I didn't look closer to your whole test suite yet, but my pure test on > protection above should mean that it's still much better for such a use > case than either (1) pre-read or (2) MADV_POPULATE_READ. Ohh... I should stop comparing UNPOPULATE with buggy implementation and compare with pre-read. I've compared apples with oranges. I'll do better benchmark for the comparison sake. I'll let you know if the performance is becoming an issue. Overall we need pagemap_ioctl + UFFD to correctly emulate Windows syscall. Secondly we also need good performance (more the better). > > Again, I hope the performance result is not a concern to you. If it is, > please let us know. > > Thanks, > -- BR, Muhammad Usama Anjum