Re: [PATCH] mm/hugetlb: avoid get wrong ptep caused by race

"Longpeng (Mike)" <longpeng2@xxxxxxxxxx> · Thu, 20 Feb 2020 10:32:21 +0800



在 2020/2/20 0:22, Sean Christopherson 写道:
> On Wed, Feb 19, 2020 at 08:21:26PM +0800, Longpeng (Mike) wrote:
>> 在 2020/2/19 9:58, Sean Christopherson 写道:
>>> FWIW, I'd be in favor of going the READ/WRITE_ONCE() route for x86, e.g.
>>> convert everything as a follow-up patch (or patches).  I'm fairly confident
>>> that KVM's usage of lookup_address_in_mm() is safe, but I wouldn't exactly
>>> bet my life on it.  I'd much rather the failing scenario be that KVM uses
>>> a sub-optimal page size as opposed to exploding on a bad pointer.
>>>
>> Um...our testcase starts 50 VMs with 2U4G(use 1G hugepage) and then do
>> live-upgrade(private feature that just modify the qemu and libvirt) and
>> live-migrate in turns for each one. However our live upgraded new QEMU won't do
>> touch_all_pages.
>> Suppose we start a VM without touch_all_pages in QEMU, the VM's guest memory is
>> not mapped in the CR3 pagetable at the moment. When the 2 vcpus running, they
>> could access some pages belong to the same 1G-hugepage, both of them will vmexit
>> due to ept_violation and then call gup-->follow_hugetlb_page-->hugetlb_fault, so
>> the race may encounter, right?
> 
> Yep.  The code I'm referring to is similar but different code that just
> happened to go into KVM for kernel 5.6.  It has no effect on the gup() flow
> that leads to this bug.  I mentioned it above as an example of code outside
> of hugetlb_fault() that would also benefit from moving to READ/WRITE_ONCE().
> 
> 
I understand better now, thanks for your patience. :)

-- 
Regards,
Longpeng(Mike)