On Tue, Jul 14, 2020 at 10:05:41PM -0700, Ram Pai wrote: > On Mon, Jul 13, 2020 at 03:15:06PM +0530, Bharata B Rao wrote: > > On Sat, Jul 11, 2020 at 02:13:45AM -0700, Ram Pai wrote: > > > The Ultravisor is expected to explicitly call H_SVM_PAGE_IN for all the pages > > > > > > if (!(*mig.src & MIGRATE_PFN_MIGRATE)) { > > > - ret = -1; > > > + ret = -2; > > > > migrate_vma_setup() has marked that this pfn can't be migrated. What > > transient errors are you observing which will disappear within 10 > > retries? > > > > Also till now when UV used to pull in all the pages, we never seemed to > > have hit these transient errors. But now when HV is pushing the same > > pages, we see these errors which are disappearing after 10 retries. > > Can you explain this more please? What sort of pages are these? > > We did see them even before this patch. The retry alleviates the > problem, but does not entirely eliminate it. If the chance of seeing > the issue without the patch is 1%, the chance of seeing this issue > with this patch becomes 0.25%. Okay, but may be we should investigate the problem a bit more to understand why the page migrations are failing before taking this route? > > > > > > goto out_finalize; > > > } > > > + bool retry = 0; > ...snip... > > > + > > > + *ret = 0; > > > + while (kvmppc_next_nontransitioned_gfn(memslot, kvm, &gfn)) { > > > + > > > + down_write(&kvm->mm->mmap_sem); > > > > Acquiring and releasing mmap_sem in a loop? Any reason? > > > > Now that you have moved ksm_madvise() calls to init time, any specific > > reason to take write mmap_sem here? > > The semaphore protects the vma. right? We took write lock just for ksm_madvise() and then downgraded to read. Now that you are moving that to init time, read is sufficient here. Regards, Bharata.