On Fri, Jul 31, 2020 at 01:37:00AM -0700, Ram Pai wrote: > On Fri, Jul 31, 2020 at 09:59:40AM +0530, Bharata B Rao wrote: > > On Thu, Jul 30, 2020 at 04:25:26PM -0700, Ram Pai wrote: > > In our case, device pages that are in use are always associated with a valid > > pvt member. See kvmppc_uvmem_get_page() which returns failure if it > > runs out of device pfns and that will result in proper failure of > > page-in calls. > > looked at the code, and yes that code path looks correct. So my > reasoning behind the root cause of this bug is incorrect. However the > bug is surfacing and there must be a reason. > > > > > For the case where we run out of device pfns, migrate_vma_finalize() will > > restore the original PTE and will not replace the PTE with device private PTE. > > > > Also kvmppc_uvmem_page_free() (=dev_pagemap_ops.page_free()) is never > > called for non-device-private pages. > > Yes. it should not be called. But as seen above in the stack trace, it is called. > > What would cause the HMM to call ->page_free() on a page that is not > associated with that device's pfn? I believe it is being called for a device private page, you can verify it when you hit it next time? > > > > > This could be a use-after-free case possibly arising out of the new state > > changes in HV. If so, this fix will only mask the bug and not address the > > original problem. > > I can verify by rerunning the tests, without the new state changes. But > I do not see how those changes can cause this fault? > > This could also be caused by a duplicate ->page_free() call due to some > bug in the migrate_page path? Could there be a race between > migrate_page() and a page_fault ? > > > Regardless, kvmppc_uvmem_page_free() needs to be fixed. It should not > access contents of pvt, without verifing pvt is valid. We don't expect pvt to be NULL here. Checking for NULL and returning isn't the right fix, I think. Regards, Bharata.