On 8/18/22 01:57, Dmitry Osipenko wrote: > On 8/15/22 18:54, Dmitry Osipenko wrote: >> On 8/15/22 17:57, Dmitry Osipenko wrote: >>> On 8/15/22 16:53, Christian König wrote: >>>> Am 15.08.22 um 15:45 schrieb Dmitry Osipenko: >>>>> [SNIP] >>>>>> Well that comment sounds like KVM is doing the right thing, so I'm >>>>>> wondering what exactly is going on here. >>>>> KVM actually doesn't hold the page reference, it takes the temporal >>>>> reference during page fault and then drops the reference once page is >>>>> mapped, IIUC. Is it still illegal for TTM? Or there is a possibility for >>>>> a race condition here? >>>>> >>>> >>>> Well the question is why does KVM grab the page reference in the first >>>> place? >>>> >>>> If that is to prevent the mapping from changing then yes that's illegal >>>> and won't work. It can always happen that you grab the address, solve >>>> the fault and then immediately fault again because the address you just >>>> grabbed is invalidated. >>>> >>>> If it's for some other reason than we should probably investigate if we >>>> shouldn't stop doing this. >>> >>> CC: +Paolo Bonzini who introduced this code >>> >>> commit add6a0cd1c5ba51b201e1361b05a5df817083618 >>> Author: Paolo Bonzini <pbonzini@xxxxxxxxxx> >>> Date: Tue Jun 7 17:51:18 2016 +0200 >>> >>> KVM: MMU: try to fix up page faults before giving up >>> >>> The vGPU folks would like to trap the first access to a BAR by setting >>> vm_ops on the VMAs produced by mmap-ing a VFIO device. The fault >>> handler >>> then can use remap_pfn_range to place some non-reserved pages in the >>> VMA. >>> >>> This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn >>> and fixup_user_fault together help supporting it. The patch also >>> supports >>> VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to >>> reference counting. >>> >>> @Paolo, >>> https://lore.kernel.org/dri-devel/73e5ed8d-0d25-7d44-8fa2-e1d61b1f5a04@xxxxxxx/T/#m7647ce5f8c4749599d2c6bc15a2b45f8d8cf8154 >>> >> >> If we need to bump the refcount only for VM_MIXEDMAP and not for >> VM_PFNMAP, then perhaps we could add a flag for that to the kvm_main >> code that will denote to kvm_release_page_clean whether it needs to put >> the page? > > The other variant that kind of works is to mark TTM pages reserved using > SetPageReserved/ClearPageReserved, telling KVM not to mess with the page > struct. But the potential consequences of doing this are unclear to me. > > Christian, do you think we can do it? Although, no. It also doesn't work with KVM without additional changes to KVM. -- Best regards, Dmitry