On Fri, Feb 24, 2017 at 12:34:07PM +0100, Christoffer Dall wrote: > On Wed, Feb 22, 2017 at 04:17:05PM +0100, Radim Krčmář wrote: > > [Oops, the end of this thread got dragged into a mark-as-read spree ...] > > > > 2017-02-17 11:13+0100, David Hildenbrand: > > >>> This is really complicated stuff, and the basic reason for it (if I > > >>> remember correctly) is that s390x does reenable all interrupts when > > >>> entering the sie (see kvm-s390.c:__vcpu_run()). So the fancy smp-based > > >>> kicks don't work (as it is otherwise just racy), and if I remember > > >>> correctly, SMP reschedule signals (s390x external calls) would be > > >>> slower. (Christian, please correct me if I'm wrong) > > >> > > >> No the reason was that there are some requests that need to be handled > > >> outside run SIE. For example one reason was the guest prefix page. > > >> This must be mapped read/write ALL THE TIME when a guest is running, > > >> otherwise the host might crash. So we have to exit SIE and make sure that > > >> it does not reenter, therefore we use the RELOAD_MMU request from a notifier > > >> that is called from page table functions, whenever memory management decides > > >> to unmap/write protect (dirty pages tracking, reference tracking, page migration > > >> or compaction...) > > >> > > >> SMP-based request wills kick out the guest, but for some thing like the > > >> one above it will be too late. > > > > > > While what you said is 100% correct, I had something else in mind that > > > hindered using vcpu_kick() and especially kvm_make_all_cpus_request(). > > > And I remember that being related to how preemption and > > > OUTSIDE_GUEST_MODE is handled. I think this boils down to what would > > > have to be implemented in kvm_arch_vcpu_should_kick(). > > > > > > x86 can track the guest state using vcpu->mode, because they can be sure > > > that the guest can't reschedule while in the critical guest entry/exit > > > section. This is not true for s390x, as preemption is enabled. That's > > > why vcpu->mode cannot be used in its current form to track if a VCPU is > > > in/oustide/exiting guest mode. And kvm_make_all_cpus_request() currently > > > relies on this setting. > > > > > > For now, calling vcpu_kick() on s390x will result in a BUG(). > > > > > > > > > On s390x, there are 3 use cases I see for requests: > > > > > > 1. Remote requests that need a sync > > > > > > Make a request, wait until SIE has been left and make sure the request > > > will be processed before re-entering the SIE. e.g. KVM_REQ_RELOAD_MMU > > > notifier in mmu notifier you mentioned. Also KVM_REQ_DISABLE_IBS is a > > > candidate. > > > > Btw. aren't those requests racy? > > > > void exit_sie(struct kvm_vcpu *vcpu) > > { > > atomic_or(CPUSTAT_STOP_INT, &vcpu->arch.sie_block->cpuflags); > > > > If you get stalled here and the target VCPU handles the request and > > reenters SIE in the meantime, then you'll wait until its next exit. > > (And miss an unbounded amount of exits in the worst case.) > > > > while (vcpu->arch.sie_block->prog0c & PROG_IN_SIE) > > cpu_relax(); > > } > > > > And out of curiosity -- how many cycles does this loop usually take? > > > > > 2. Remote requests that don't need a sync > > > > > > E.g. KVM_REQ_ENABLE_IBS doesn't strictly need it, while > > > KVM_REQ_DISABLE_IBS does. > > > > A usual KVM request would kick the VCPU out of nested virt as well. > > Shouldn't it be done for these as well? > > > > > 3. local requests > > > > > > E.g. KVM_REQ_TLB_FLUSH from kvm_s390_set_prefix() > > > > > > > > > Of course, having a unified interface would be better. > > > > > > /* set the request and kick the CPU out of guest mode */ > > > kvm_set_request(req, vcpu); > > > > > > /* set the request, kick the CPU out of guest mode, wait until guest > > > mode has been left and make sure the request will be handled before > > > reentering guest mode */ > > > kvm_set_sync_request(req, vcpu); > > > > Sounds good, I'll also add > > > > kvm_set_self_request(req, vcpu); > > > > > Same maybe even for multiple VCPUs (as there are then ways to speed it > > > up, e.g. first kick all, then wait for all) > > > > > > This would require arch specific callbacks to > > > 1. pre announce the request (e.g. set PROG_REQUEST on s390x) > > > 2. kick the cpu (e.g. CPUSTAT_STOP_INT and later > > > kvm_s390_vsie_kick(vcpu) on s390x) > > > 3. check if still executing the guest (e.g. PROG_IN_SIE on s390x) > > > > > > This would only make sense if there are other use cases for sync > > > requests. At least I remember that Power also has a faster way for > > > kicking VCPUs, not involving SMP rescheds. I can't judge if this is a > > > s390x only thing and is better be left as is :) > > > > > > At least vcpu_kick() could be quite easily made to work on s390x. > > > > > > Radim, are there also other users that need something like sync requests? > > > > I think that ARM has a similar need when updating vgic, but relies on an > > asumption that VCPUs are going to be out after kicking them with > > kvm_make_all_cpus_request(). > > (vgic_change_active_prepare in virt/kvm/arm/vgic/vgic-mmio.c) > > Yes, we have similar needs. We don't actually use the requests > infrastructure in the moment (although I have plans to move to that > following a long series of optimization patches I have stashed on my > machine), but we reuse the kvm_make_all_cpus_request function to figure > out which CPUs need a kick, and which don't, instead of duplicating this > logic in the ARM tree. I also have patches (that I was going to post a week ago, but then decided to base on Radim's work) that bring vcpu-requests to arm. The patches I have are to plug a few races, most notably ones in PSCI emulation. You may recall an off-list discussion we had about those races that took place roughly one million years ago. Indeed it was your suggestion at the time to bring vcpu-requests to arm. I'll still post those patches shortly. I think Radim plans to post a v2 soon, so I'll rebase on that. > > > > > Having synchronous requests in a common API should probably wait for the > > completion of the request, not just for the kick, which would make race > > handling simpler. > > > > I'm not going to worry about them in this pass, though. > > > > I'll be happy to help working on this or at least reviewing stuff to > move our home-baked "stop all VCPUs and wait for something before > entering the guest again" functionality to common functionality that > uses requests. > Thanks, drew