Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

Xiao Guangrong <xiaoguangrong@xxxxxxxxxxxxxxxxxx> · Tue, 19 Aug 2014 12:41:11 +0800

On 08/19/2014 12:31 PM, David Matlack wrote:
> On Mon, Aug 18, 2014 at 8:50 PM, Xiao Guangrong
> <xiaoguangrong@xxxxxxxxxxxxxxxxxx> wrote:
>> On 08/19/2014 05:15 AM, David Matlack wrote:
>>> On Mon, Aug 18, 2014 at 12:56 PM, Xiao Guangrong
>>> <xiaoguangrong.eric@xxxxxxxxx> wrote:
>>>> @@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
>>>>
>>>>  static bool check_mmio_spte(struct kvm *kvm, u64 spte)
>>>>  {
>>>> +       struct kvm_memslots *slots = kvm_memslots(kvm);
>>>>         unsigned int kvm_gen, spte_gen;
>>>>
>>>> -       kvm_gen = kvm_current_mmio_generation(kvm);
>>>> +       if (slots->updated)
>>>> +               return false;
>>>> +
>>>> +       smp_rmb();
>>>> +
>>>> +       kvm_gen = __kvm_current_mmio_generation(slots);
>>>>         spte_gen = get_mmio_spte_generation(spte);
>>>>
>>>
>>> What does this fix? Case 2 can still happen. (Case 2 is unavoidable unless we
>>> block during memslot updates, which I don't think we should :).
>>
>> This exactly fixes case 2, slots->updated just acts as the "low bit"
>> but avoid generation number wrap-around and trick handling of the number.
>> More details please see below.
>>
>>>
>>>>         trace_check_mmio_spte(spte, kvm_gen, spte_gen);
>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>>> index 4b6c01b..1d4e78f 100644
>>>> --- a/virt/kvm/kvm_main.c
>>>> +++ b/virt/kvm/kvm_main.c
>>>> @@ -96,7 +96,7 @@ static void hardware_disable_all(void);
>>>>
>>>>  static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
>>>>  static void update_memslots(struct kvm_memslots *slots,
>>>> -                           struct kvm_memory_slot *new, u64 last_generation);
>>>> +                           struct kvm_memory_slot *new);
>>>>
>>>>  static void kvm_release_pfn_dirty(pfn_t pfn);
>>>>  static void mark_page_dirty_in_slot(struct kvm *kvm,
>>>> @@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots)
>>>>  }
>>>>
>>>>  static void update_memslots(struct kvm_memslots *slots,
>>>> -                           struct kvm_memory_slot *new,
>>>> -                           u64 last_generation)
>>>> +                           struct kvm_memory_slot *new)
>>>>  {
>>>>         if (new) {
>>>>                 int id = new->id;
>>>> @@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots,
>>>>                 if (new->npages != npages)
>>>>                         sort_memslots(slots);
>>>>         }
>>>> -
>>>> -       slots->generation = last_generation + 1;
>>>>  }
>>>>
>>>>  static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
>>>> @@ -720,10 +717,17 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
>>>>  {
>>>>         struct kvm_memslots *old_memslots = kvm->memslots;
>>>>
>>>> -       update_memslots(slots, new, kvm->memslots->generation);
>>>> +       /* ensure generation number is always increased. */
>>>> +       slots->updated = true;
>>>> +       slots->generation = old_memslots->generation;
>>>> +       update_memslots(slots, new);
>>>>         rcu_assign_pointer(kvm->memslots, slots);
>>>>         synchronize_srcu_expedited(&kvm->srcu);
>>>>
>>>> +       slots->generation++;
>>>> +       smp_wmb();
>>>> +       slots->updated = false;
>>>> +
>>>>         kvm_arch_memslots_updated(kvm);
>>>>
>>>>         return old_memslots;
>>>>
>>>
>>> This is effectively the same as the first approach.
>>>
>>> I just realized how simple Paolo's idea is. I think it can be a one line
>>> patch (without comments):
>>>
>>> [...]
>>>         update_memslots(slots, new, kvm->memslots->generation);
>>>         rcu_assign_pointer(kvm->memslots, slots);
>>>         synchronize_srcu_expedited(&kvm->srcu);
>>> +       slots->generation++;
>>>
>>>         kvm_arch_memslots_updated(kvm);
>>> [...]
>>
>> Really? Unfortunately no. :)
>>
>> See this scenario:
>>
>> CPU 0                                  CPU 1
>> ioctl registering a new memslot which
>> contains GPA:
>>                            page-fault handler:
>>                              see it'a mmio access on GPA;
>>
>>  assign the new memslots with generation number increased
>>                              cache the generation-number into spte;
>>                              fix the access and comeback to guest;
>> SRCU-sync
>>                              page-fault again and check the spte is a valid mmio-spte(*)
>> generation-number++;
>> return to userspace;
>>                              do mmio-emulation and inject mmio-exit;
>>
>> !!! userspace receives a unexpected mmio-exit, that is case 2 i exactly
>> said in the last mail.
>>
>>
>> Note in the step *, my approach detects the invalid generation-number which
>> will invalidate the mmio spte properly .
> 
> Sorry I didn't explain myself very well: Since we can get a single wrong
> mmio exit no matter what, it has to be handled in userspace. So my point
> was, it doesn't really help to fix that one very specific way that it can
> happen, because it can just happen in other ways. (E.g. update memslots
> occurs after is_noslot_pfn() and before mmio exit).
> 
> But it looks like you basically said the same thing earlier, so I think
> we're on the same page.
> 

Yes, that is what i try to explain in previous mails. :(

> The single line patch I suggested was only intended to fix the "forever
> incorrectly exit mmio".

My patch also fixes this case and that does not doubly increase the
number. I think this is the better one.

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html