On Mon, 2021-10-25 at 11:39 +0100, David Woodhouse wrote: > > One possible solution (which I even have unfinished patches for) is to > > put all the gfn_to_pfn_caches on a list, and refresh them when the MMU > > notifier receives an invalidation. > > For this use case I'm not even sure why I'd *want* to cache the PFN and > explicitly kmap/memremap it, when surely by *definition* there's a > perfectly serviceable HVA which already points to it? That's indeed true for *this* use case but my *next* use case is actually implementing the event channel delivery. What we have in-kernel already is everything we absolutely *need* in order to host Xen guests, but I really do want to fix the fact that even IPIs and timers are bouncing up through userspace. Xen 2-level event channel delivery is a series of test-and-set operations. For delivering a given port#, we: • Test-and-set the corresponding port# bit in the shared info page. If it was already set, we're done. • Test the corresponding 'masked' bit in the shared info page. If it was already set, we're done. • Test-and-test the bit in the target vcpu_info 'evtchn_pending_sel' which corresponds to the *word* in which the port# resides. If it was already set, we're done. • Set the 'evtchn_upcall_pending' bit in the target vcpu_info to trigger the vector delivery. In João and Ankur's original version¹ this was really simple; it looked like this: if (test_and_set_bit(p, (unsigned long *) shared_info->evtchn_pending)) return 1; if (!test_bit(p, (unsigned long *) shared_info->evtchn_mask) && !test_and_set_bit(p / BITS_PER_EVTCHN_WORD, (unsigned long *) &vcpu_info->evtchn_pending_sel)) return kvm_xen_evtchn_2l_vcpu_set_pending(vcpu_info); Yay for permanently pinned pages! :) So, with a fixed version of kvm_map_gfn() I suppose I could do the same, but that's *two* maps/unmaps for each interrupt? That's probably worse than just bouncing out and letting userspace do it! So even for the event channel delivery use case, if I'm not allowed to just pin the pages permanently then I stand by the observation that I *have* a perfectly serviceable HVA for it already. I can even do the test-and-set in userspace based on the futex primitives, but the annoying part is that if the page does end up absent, I need to *store* the pending operation because there will be times when we're trying to deliver interrupts but *can't* sleep and wait for the page. So that probably means 512 bytes of evtchn bitmap *per vcpu* in order to store the event channels which are pending for each vCPU, and a way to replay them from a context which *can* sleep. And if I have *that* then I might as well use it to solve the problem of the gpa_to_hva_cache being single-threaded, and let a vCPU do its own writes to its vcpu_info *every* time. With perhaps a little more thinking about how I use a gpa_to_hva_cache for the shinfo page (which you removed in commit 319afe68), but perhaps starting with the observation that it's only not thread-capable when it's *invalid* and needs to be refreshed... ¹ https://lore.kernel.org/lkml/20190220201609.28290-12-joao.m.martins@xxxxxxxxxx/
Attachment:
smime.p7s
Description: S/MIME cryptographic signature