Re: [PATCH v19 059/130] KVM: x86/tdp_mmu: Don't zap private pages for unsupported cases

"Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx> · Sun, 21 Apr 2024 01:58:07 +0000

On Mon, 2024-02-26 at 00:26 -0800, isaku.yamahata@xxxxxxxxx wrote:
> +/* Used by mmu notifier via kvm_unmap_gfn_range() */
>  bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range
> *range,
>                                  bool flush)
>  {
>         struct kvm_mmu_page *root;
> +       bool zap_private = false;
> +
> +       if (kvm_gfn_shared_mask(kvm)) {
> +               if (!range->only_private && !range->only_shared)
> +                       /* attributes change */
> +                       zap_private = !(range->arg.attributes &
> +                                       KVM_MEMORY_ATTRIBUTE_PRIVATE);
> +               else
> +                       zap_private = range->only_private;
> +       }
>  
>         __for_each_tdp_mmu_root_yield_safe(kvm, root, range->slot->as_id,
> false)
>                 flush = tdp_mmu_zap_leafs(kvm, root, range->start, range->end,
> -                                         range->may_block, flush);
> +                                         range->may_block, flush,
> +                                         zap_private && is_private_sp(root));
>  

I'm trying to apply the feedback:
 - Drop MTRR support
 - Changing only_private/shared to exclude_private/shared
...and update the log accordingly. These changes are all intersect in this
function and I'm having a hard time trying to justify the resulting logic.

It seems the point of passing the the attributes is because:
"However, looking at kvm_mem_attrs_changed() again, I think invoking
kvm_unmap_gfn_range() from generic KVM code is a mistake and shortsighted. 
Zapping in response to *any* attribute change is very private/shared centric. 
E.g. if/when we extend attributes to provide per-page RWX protections, zapping
existing SPTEs in response to granting *more* permissions may not be necessary
or even desirable."
https://lore.kernel.org/all/ZJX0hk+KpQP0KUyB@xxxxxxxxxx/

But I think shoving the logic for how to handle the attribute changes deep into
the zapping code is the opposite extreme. It results in this confusing logic
with the decision on what to zap is spread all around.

Instead we should have kvm_arch_pre_set_memory_attributes() adjust the range so
it can tell kvm_unmap_gfn_range() which ranges to zap (private/shared).

So:
kvm_vm_set_mem_attributes() - passes attributes
kvm_arch_pre_set_memory_attributes() - chooses which private/shared alias to 
                                       zap based on attribute.
kvm_unmap_gfn_range/kvm_tdp_mmu_unmap_gfn_range - zaps the private/shared alias
tdp_mmu_zap_leafs() - doesn't care about the root type, just zaps leafs

This zapping function can then just do the simple thing it's told to do. It ends
up looking like:

bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range,
				 bool flush)
{
	struct kvm_mmu_page *root;
	bool exclude_private = false;
	bool exclude_shared = false;

	if (kvm_gfn_shared_mask(kvm)) {
		exclude_private = range->exclude_private;
		exclude_shared = range->exclude_shared;
	}

	__for_each_tdp_mmu_root_yield_safe(kvm, root, range->slot->as_id,
false) {
		if (exclude_private && is_private_sp(root))
			continue;
		if (exclude_shared && !is_private_sp(root))
			continue;

		flush = tdp_mmu_zap_leafs(kvm, root, range->start, range->end,
					  range->may_block, flush);
	}

	return flush;
}

The resulting logic should be the same. Separately, we might be able to simplify
it further if we change the behavior a bit (lose the kvm_gfn_shared_mask() check
or the exclude_shared member), but in the meantime this seems a lot easier to
explain and review for what I think is equivalent behavior.