Re: [PATCH v4 1/2] kvm/arm64: Remove the creation time's mapping of MMIO regions

Keqian Zhu <zhukeqian1@xxxxxxxxxx> · Fri, 23 Apr 2021 09:36:05 +0800

Hi Gavin,

On 2021/4/23 9:35, Gavin Shan wrote:
> Hi Keqian,
> 
> On 4/22/21 5:41 PM, Keqian Zhu wrote:
>> On 2021/4/22 10:12, Gavin Shan wrote:
>>> On 4/21/21 4:28 PM, Keqian Zhu wrote:
>>>> On 2021/4/21 14:38, Gavin Shan wrote:
>>>>> On 4/16/21 12:03 AM, Keqian Zhu wrote:
> 
> [...]
> 
>>>
>>> Yeah, Sorry that I missed that part. Something associated with Santosh's
>>> patch. The flag can be not existing until the page fault happened on
>>> the vma. In this case, the check could be not working properly.
>>>
>>>    [PATCH] KVM: arm64: Correctly handle the mmio faulting
>> Yeah, you are right.
>>
>> If that happens, we won't try to use block mapping for memslot with VM_PFNMAP.
>> But it keeps a same logic with old code.
>>
>> 1. When without dirty-logging, we won't try block mapping for it, and we'll
>> finally know that it's device, so won't try to do adjust THP (Transparent Huge Page)
>> for it.
>> 2. If userspace wrongly enables dirty logging for this memslot, we'll force_pte for it.
>>
> 
> It's not about the patch itself and just want more discussion to get more details.
> The patch itself looks good to me. I got two questions as below:
> 
> (1) The memslot fails to be added if it's backed by MMIO region and dirty logging is
> enabled in kvm_arch_prepare_memory_region(). As Santosh reported, the corresponding
> vma could be associated with MMIO region and VM_PFNMAP is missed. In this case,
> kvm_arch_prepare_memory_region() isn't returning error, meaning the memslot can be
> added successfully and block mapping isn't used, as you mentioned. The question is
> the memslot is added, but the expected result would be failure.
Sure. I think we could try to populate the final flag of vma in kvm_arch_prepare_memory_region().
Maybe through GUP or any better method? It's nice if you can try to solve this. :)

> 
> (2) If dirty logging is enabled on the MMIO memslot, everything should be fine. If
> the dirty logging isn't enabled and VM_PFNMAP isn't set yet in user_mem_abort(),
> block mapping won't be used and PAGE_SIZE is picked, but the failing IPA might
> be good candidate for block mapping. It means we miss something for blocking
> mapping?
Right. This issue also can be solved by populating the final flag of vma in kvm_arch_prepare_memory_region().

> 
> By the way, do you have idea why dirty logging can't be enabled on MMIO memslot?
IIUC, MMIO region is of device memory type, it's associated with device state and action.
For normal memory type, we can write it out-of-order and repeatedly, but for device memory
type, we can't do that. The write to MMIO will trigger device action based on current device
state, also what we can read from MMIO based on current device state. Thus the policy of
dirty logging for normal memory can't be applied to MMIO.

> I guess Marc might know the history. For example, QEMU is taking "/dev/mem" or
> "/dev/kmem" to back guest's memory, the vma is marked as MMIO, but dirty logging
> and migration isn't supported?
The MMIO region is a part of device state. We need extra kernel driver to support migration
of pass-through device, as how to save and restore the device state is closely related to
a specific type of device. You can refer VFIO migration for more detail.

Thanks,
Keqian