Re: [PATCH] KVM: MMU: fix huge page adapted on non-PAE host

Xiao Guangrong <xiaoguangrong@xxxxxxxxxxxxxxxxxx> · Mon, 28 May 2012 21:41:24 +0800

On 05/28/2012 09:14 PM, Avi Kivity wrote:

> On 05/28/2012 03:56 PM, Xiao Guangrong wrote:
>> On 05/28/2012 08:24 PM, Avi Kivity wrote:
>>
>>> On 05/28/2012 02:39 PM, Xiao Guangrong wrote:
>>>> On 05/28/2012 06:57 PM, Avi Kivity wrote:
>>>>
>>>>> On 05/28/2012 09:10 AM, Xiao Guangrong wrote:
>>>>>> The huge page size is 4M on non-PAE host, but 2M page size is used in
>>>>>> transparent_hugepage_adjust(), so the page we get after adjust the
>>>>>> mapping level is not the head page, the BUG_ON() will be triggered
>>>>>>
>>>>>>
>>>>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>>>>>> index 72102e0..be3cea4 100644
>>>>>> --- a/arch/x86/kvm/mmu.c
>>>>>> +++ b/arch/x86/kvm/mmu.c
>>>>>> @@ -2595,8 +2595,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
>>>>>>  			*gfnp = gfn;
>>>>>>  			kvm_release_pfn_clean(pfn);
>>>>>>  			pfn &= ~mask;
>>>>>> -			if (!get_page_unless_zero(pfn_to_page(pfn)))
>>>>>> -				BUG();
>>>>>> +			kvm_get_pfn(pfn);
>>>>>>  			*pfnp = pfn;
>>>>>>  		}
>>>>>>  	}
>>>>>
>>>>> Shouldn't we adjust mask instead?
>>>>>
>>>>
>>>>
>>>> Adjusting mask to map the whole 4M huge page to KVM guest?
>>>
>>> The code moves the refcount from the small page to the huge page.  i.e.
>>> from pfn 0x1312 to pfn 0x1200.  But if the huge page frame contains
>>> 0x400 pages, it should move the refcount to pfn 0x1000.
>>>
>>
>>
>> We need not move the refcount to the huge page (the head of pages), moving
>> the refcount to the any middle small page is also ok, get_page() will
>> properly handle it:
>>
>> get_page() -> __get_page_tail():
>>
>> |	struct page *page_head = compound_trans_head(page);
>> |
>> |	if (likely(page != page_head && get_page_unless_zero(page_head))) {
>> |		/*
>> |		 * page_head wasn't a dangling pointer but it
>> |		 * may not be a head page anymore by the time
>> |		 * we obtain the lock. That is ok as long as it
>> |		 * can't be freed from under us.
>> |		 */
>> |		flags = compound_lock_irqsave(page_head);
>> |		/* here __split_huge_page_refcount won't run anymore */
>> |		if (likely(PageTail(page))) {
>> |			__get_page_tail_foll(page, false);
>> |			got = true;
>> |		}
>> |		compound_unlock_irqrestore(page_head, flags);
>> |		if (unlikely(!got))
>> |			put_page(page_head);
>> |	}
>>
>> The refcount of page_head is increased.
>>
> 
> So, the whole thing is unneeded?  Andrea?
> 

I think the reason we move refcount in current code is, we should increase the
refcount of the page we will mapped into shadow page table, since we always
decrease its refcount after it is mapped. (That is this patch does.)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html