On Wed, Feb 22, 2017 at 04:17:05PM +0100, Radim Krčmář wrote: > [Oops, the end of this thread got dragged into a mark-as-read spree ...] > > 2017-02-17 11:13+0100, David Hildenbrand: > >>> This is really complicated stuff, and the basic reason for it (if I > >>> remember correctly) is that s390x does reenable all interrupts when > >>> entering the sie (see kvm-s390.c:__vcpu_run()). So the fancy smp-based > >>> kicks don't work (as it is otherwise just racy), and if I remember > >>> correctly, SMP reschedule signals (s390x external calls) would be > >>> slower. (Christian, please correct me if I'm wrong) > >> > >> No the reason was that there are some requests that need to be handled > >> outside run SIE. For example one reason was the guest prefix page. > >> This must be mapped read/write ALL THE TIME when a guest is running, > >> otherwise the host might crash. So we have to exit SIE and make sure that > >> it does not reenter, therefore we use the RELOAD_MMU request from a notifier > >> that is called from page table functions, whenever memory management decides > >> to unmap/write protect (dirty pages tracking, reference tracking, page migration > >> or compaction...) > >> > >> SMP-based request wills kick out the guest, but for some thing like the > >> one above it will be too late. > > > > While what you said is 100% correct, I had something else in mind that > > hindered using vcpu_kick() and especially kvm_make_all_cpus_request(). > > And I remember that being related to how preemption and > > OUTSIDE_GUEST_MODE is handled. I think this boils down to what would > > have to be implemented in kvm_arch_vcpu_should_kick(). > > > > x86 can track the guest state using vcpu->mode, because they can be sure > > that the guest can't reschedule while in the critical guest entry/exit > > section. This is not true for s390x, as preemption is enabled. That's > > why vcpu->mode cannot be used in its current form to track if a VCPU is > > in/oustide/exiting guest mode. And kvm_make_all_cpus_request() currently > > relies on this setting. > > > > For now, calling vcpu_kick() on s390x will result in a BUG(). > > > > > > On s390x, there are 3 use cases I see for requests: > > > > 1. Remote requests that need a sync > > > > Make a request, wait until SIE has been left and make sure the request > > will be processed before re-entering the SIE. e.g. KVM_REQ_RELOAD_MMU > > notifier in mmu notifier you mentioned. Also KVM_REQ_DISABLE_IBS is a > > candidate. > > Btw. aren't those requests racy? > > void exit_sie(struct kvm_vcpu *vcpu) > { > atomic_or(CPUSTAT_STOP_INT, &vcpu->arch.sie_block->cpuflags); > > If you get stalled here and the target VCPU handles the request and > reenters SIE in the meantime, then you'll wait until its next exit. > (And miss an unbounded amount of exits in the worst case.) > > while (vcpu->arch.sie_block->prog0c & PROG_IN_SIE) > cpu_relax(); > } > > And out of curiosity -- how many cycles does this loop usually take? > > > 2. Remote requests that don't need a sync > > > > E.g. KVM_REQ_ENABLE_IBS doesn't strictly need it, while > > KVM_REQ_DISABLE_IBS does. > > A usual KVM request would kick the VCPU out of nested virt as well. > Shouldn't it be done for these as well? > > > 3. local requests > > > > E.g. KVM_REQ_TLB_FLUSH from kvm_s390_set_prefix() > > > > > > Of course, having a unified interface would be better. > > > > /* set the request and kick the CPU out of guest mode */ > > kvm_set_request(req, vcpu); > > > > /* set the request, kick the CPU out of guest mode, wait until guest > > mode has been left and make sure the request will be handled before > > reentering guest mode */ > > kvm_set_sync_request(req, vcpu); > > Sounds good, I'll also add > > kvm_set_self_request(req, vcpu); > > > Same maybe even for multiple VCPUs (as there are then ways to speed it > > up, e.g. first kick all, then wait for all) > > > > This would require arch specific callbacks to > > 1. pre announce the request (e.g. set PROG_REQUEST on s390x) > > 2. kick the cpu (e.g. CPUSTAT_STOP_INT and later > > kvm_s390_vsie_kick(vcpu) on s390x) > > 3. check if still executing the guest (e.g. PROG_IN_SIE on s390x) > > > > This would only make sense if there are other use cases for sync > > requests. At least I remember that Power also has a faster way for > > kicking VCPUs, not involving SMP rescheds. I can't judge if this is a > > s390x only thing and is better be left as is :) > > > > At least vcpu_kick() could be quite easily made to work on s390x. > > > > Radim, are there also other users that need something like sync requests? > > I think that ARM has a similar need when updating vgic, but relies on an > asumption that VCPUs are going to be out after kicking them with > kvm_make_all_cpus_request(). > (vgic_change_active_prepare in virt/kvm/arm/vgic/vgic-mmio.c) Yes, we have similar needs. We don't actually use the requests infrastructure in the moment (although I have plans to move to that following a long series of optimization patches I have stashed on my machine), but we reuse the kvm_make_all_cpus_request function to figure out which CPUs need a kick, and which don't, instead of duplicating this logic in the ARM tree. > > Having synchronous requests in a common API should probably wait for the > completion of the request, not just for the kick, which would make race > handling simpler. > > I'm not going to worry about them in this pass, though. > I'll be happy to help working on this or at least reviewing stuff to move our home-baked "stop all VCPUs and wait for something before entering the guest again" functionality to common functionality that uses requests. Thanks, -Christoffer