Re: [1/1] KVM: restrict kvm_gfn_to_hva_cache_init() to only accept address ranges within one page

Sean Christopherson <seanjc@xxxxxxxxxx> · Fri, 26 Apr 2024 09:17:30 -0700

On Fri, Apr 26, 2024, Kele Huang wrote:

Please don't top post.  https://people.kernel.org/tglx/notes-about-netiquette

> > On Thu, Apr 25, 2024 at 9:16 PM Chen, Zide <zide.chen@xxxxxxxxx> wrote:
> >> On 4/22/2024 7:49 PM, Kele Huang wrote:
> >> > Function kvm_gfn_to_hva_cache_init() is exported and used to init
> >> > gfn to hva cache at various places, such as called in function
> >> > kvm_pv_enable_async_pf().  However, this function directly tail
> >> > calls function __kvm_gfn_to_hva_cache_init(), which assigns
> >> > ghc->memslot to NULL and returns 0 for cache initialization of
> >> > cross pages cache.  This is unsafe as 0 typically means a successful
> >> > return, but it actually fails to return a valid ghc->memslot.
> >> > The functions call kvm_gfn_to_hva_cache_init(), such as
> >> > kvm_lapic_set_vapicz_addr() do not make future checking on the
> >> > ghc->memslot if kvm_gfn_to_hva_cache_init() returns a 0.  Moreover,
> >> > other developers may try to initialize a cache across pages by
> >> > calling this function but fail with a success return value.
> >> >
> >> > This patch fixes this issue by explicitly restricting function
> >> > kvm_gfn_to_hva_cache_init() to only accept address ranges within
> >> > one page and adding comments to the function accordingly.
> >> >
> >>
> >> For cross page cache, returning a zero is not really a mistake, since it
> >> verifies the entire region against the memslot and does initialize ghc.
> >>
> >> The nullified memslot is to indicate that the read or write should
> >> follow the slow path, and it's the caller's responsibility to check
> >> ghc->memslot if needed.  e.g., kvm_read_guest_offset_cached() and
> >> kvm_write_guest_offset_cached().  Please refer to commit fcfbc617547f
> >> ("KVM: Check for a bad hva before dropping into the ghc slow path")
>
> Thanks for the feedback!  Yes, callers including kvm_read_guest_offset_cached()
> and kvm_write_guest_offset_cached() checked if the read or write should follow
> the slow path for the guest memory read/write by checking if
> ghc->memslot is nullified.
> However, please note that they are calling the static function
> `__kvm_gfn_to_hva_cache_init()` instead of the exported function
> `kvm_gfn_to_hva_cache_init()`.
> 
> Although `__kvm_gfn_to_hva_cache_init()` has detailed comments about using
> a slow path for cross page but actually the "slow path" is to read/write guest
> memory instead of for setting up a cross page cache.  The difference here
> I think is that `kvm_gfn_to_hva_cache_init()` is an exported function and
> its name indicates it is used for gfn to hva cache init, while the
> return value 0
> does not really guarantee the cache is initialized when the address range
> crosses pages.

The exports from kvm.ko are intended for use only by KVM itself, e.g. by
kvm-intel.ko and kvm-amd.ko on x86.  Anyone trying to use KVM's exports in random
drivers is in for a world of hurt, as there many, many subtleties throughout KVM
that bleed all over the exports.

It's gross that KVM has "internal" exports, and we have line of sight to removing
them for most architectures, including x86, but that isn't the easiest of changes.

If there is a real problem with in-tree upstream KVM, then we'll fix it, but
changing the behavior of KVM functions just because they are exported isn't going
to happen.

> For example, function kvm_lapic_set_vapicz_addr()
> called `kvm_gfn_to_hva_cache_init()` and simply assumes the cache is
> successfully initialized by checking the return value.

I don't follow the concern here.  It's userspace's responsibility to provide a
page-aligned address.  KVM needs to not explode on an unaligned address, but
ensuring that KVM can actually use the fast path isn't KVM's problem.

> My thought is there probably should be another function to provide correct
> cross page cache initialization but I am not sure if this is really needed.

If there were a legitimate use case where it was truly valuable, then we could
add that functionality.  But all of the usage in KVM either deals with assets
that are exactly one page in size and must be page aligned, or with assets that
are userspace or guest controlled and not worth optimizing for page splits because
a well-behavior userspace/guest should ensure the asset doesn't split a page.

> Nevertheless, I think we could at least make the existing function
> more accurate?

More accurate with respect to what?