Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

David Matlack <dmatlack@xxxxxxxxxx> · Mon, 18 Aug 2014 21:31:50 -0700

On Mon, Aug 18, 2014 at 8:50 PM, Xiao Guangrong
<xiaoguangrong@xxxxxxxxxxxxxxxxxx> wrote:
> On 08/19/2014 05:15 AM, David Matlack wrote:
>> On Mon, Aug 18, 2014 at 12:56 PM, Xiao Guangrong
>> <xiaoguangrong.eric@xxxxxxxxx> wrote:
>>> @@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
>>>
>>>  static bool check_mmio_spte(struct kvm *kvm, u64 spte)
>>>  {
>>> +       struct kvm_memslots *slots = kvm_memslots(kvm);
>>>         unsigned int kvm_gen, spte_gen;
>>>
>>> -       kvm_gen = kvm_current_mmio_generation(kvm);
>>> +       if (slots->updated)
>>> +               return false;
>>> +
>>> +       smp_rmb();
>>> +
>>> +       kvm_gen = __kvm_current_mmio_generation(slots);
>>>         spte_gen = get_mmio_spte_generation(spte);
>>>
>>
>> What does this fix? Case 2 can still happen. (Case 2 is unavoidable unless we
>> block during memslot updates, which I don't think we should :).
>
> This exactly fixes case 2, slots->updated just acts as the "low bit"
> but avoid generation number wrap-around and trick handling of the number.
> More details please see below.
>
>>
>>>         trace_check_mmio_spte(spte, kvm_gen, spte_gen);
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index 4b6c01b..1d4e78f 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -96,7 +96,7 @@ static void hardware_disable_all(void);
>>>
>>>  static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
>>>  static void update_memslots(struct kvm_memslots *slots,
>>> -                           struct kvm_memory_slot *new, u64 last_generation);
>>> +                           struct kvm_memory_slot *new);
>>>
>>>  static void kvm_release_pfn_dirty(pfn_t pfn);
>>>  static void mark_page_dirty_in_slot(struct kvm *kvm,
>>> @@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots)
>>>  }
>>>
>>>  static void update_memslots(struct kvm_memslots *slots,
>>> -                           struct kvm_memory_slot *new,
>>> -                           u64 last_generation)
>>> +                           struct kvm_memory_slot *new)
>>>  {
>>>         if (new) {
>>>                 int id = new->id;
>>> @@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots,
>>>                 if (new->npages != npages)
>>>                         sort_memslots(slots);
>>>         }
>>> -
>>> -       slots->generation = last_generation + 1;
>>>  }
>>>
>>>  static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
>>> @@ -720,10 +717,17 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
>>>  {
>>>         struct kvm_memslots *old_memslots = kvm->memslots;
>>>
>>> -       update_memslots(slots, new, kvm->memslots->generation);
>>> +       /* ensure generation number is always increased. */
>>> +       slots->updated = true;
>>> +       slots->generation = old_memslots->generation;
>>> +       update_memslots(slots, new);
>>>         rcu_assign_pointer(kvm->memslots, slots);
>>>         synchronize_srcu_expedited(&kvm->srcu);
>>>
>>> +       slots->generation++;
>>> +       smp_wmb();
>>> +       slots->updated = false;
>>> +
>>>         kvm_arch_memslots_updated(kvm);
>>>
>>>         return old_memslots;
>>>
>>
>> This is effectively the same as the first approach.
>>
>> I just realized how simple Paolo's idea is. I think it can be a one line
>> patch (without comments):
>>
>> [...]
>>         update_memslots(slots, new, kvm->memslots->generation);
>>         rcu_assign_pointer(kvm->memslots, slots);
>>         synchronize_srcu_expedited(&kvm->srcu);
>> +       slots->generation++;
>>
>>         kvm_arch_memslots_updated(kvm);
>> [...]
>
> Really? Unfortunately no. :)
>
> See this scenario:
>
> CPU 0                                  CPU 1
> ioctl registering a new memslot which
> contains GPA:
>                            page-fault handler:
>                              see it'a mmio access on GPA;
>
>  assign the new memslots with generation number increased
>                              cache the generation-number into spte;
>                              fix the access and comeback to guest;
> SRCU-sync
>                              page-fault again and check the spte is a valid mmio-spte(*)
> generation-number++;
> return to userspace;
>                              do mmio-emulation and inject mmio-exit;
>
> !!! userspace receives a unexpected mmio-exit, that is case 2 i exactly
> said in the last mail.
>
>
> Note in the step *, my approach detects the invalid generation-number which
> will invalidate the mmio spte properly .

Sorry I didn't explain myself very well: Since we can get a single wrong
mmio exit no matter what, it has to be handled in userspace. So my point
was, it doesn't really help to fix that one very specific way that it can
happen, because it can just happen in other ways. (E.g. update memslots
occurs after is_noslot_pfn() and before mmio exit).

But it looks like you basically said the same thing earlier, so I think
we're on the same page.

The single line patch I suggested was only intended to fix the "forever
incorrectly exit mmio".
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html