On 02/22/2017 04:17 PM, Radim Krčmář wrote: > [Oops, the end of this thread got dragged into a mark-as-read spree ...] > > 2017-02-17 11:13+0100, David Hildenbrand: >>>> This is really complicated stuff, and the basic reason for it (if I >>>> remember correctly) is that s390x does reenable all interrupts when >>>> entering the sie (see kvm-s390.c:__vcpu_run()). So the fancy smp-based >>>> kicks don't work (as it is otherwise just racy), and if I remember >>>> correctly, SMP reschedule signals (s390x external calls) would be >>>> slower. (Christian, please correct me if I'm wrong) >>> >>> No the reason was that there are some requests that need to be handled >>> outside run SIE. For example one reason was the guest prefix page. >>> This must be mapped read/write ALL THE TIME when a guest is running, >>> otherwise the host might crash. So we have to exit SIE and make sure that >>> it does not reenter, therefore we use the RELOAD_MMU request from a notifier >>> that is called from page table functions, whenever memory management decides >>> to unmap/write protect (dirty pages tracking, reference tracking, page migration >>> or compaction...) >>> >>> SMP-based request wills kick out the guest, but for some thing like the >>> one above it will be too late. >> >> While what you said is 100% correct, I had something else in mind that >> hindered using vcpu_kick() and especially kvm_make_all_cpus_request(). >> And I remember that being related to how preemption and >> OUTSIDE_GUEST_MODE is handled. I think this boils down to what would >> have to be implemented in kvm_arch_vcpu_should_kick(). >> >> x86 can track the guest state using vcpu->mode, because they can be sure >> that the guest can't reschedule while in the critical guest entry/exit >> section. This is not true for s390x, as preemption is enabled. That's >> why vcpu->mode cannot be used in its current form to track if a VCPU is >> in/oustide/exiting guest mode. And kvm_make_all_cpus_request() currently >> relies on this setting. >> >> For now, calling vcpu_kick() on s390x will result in a BUG(). >> >> >> On s390x, there are 3 use cases I see for requests: >> >> 1. Remote requests that need a sync >> >> Make a request, wait until SIE has been left and make sure the request >> will be processed before re-entering the SIE. e.g. KVM_REQ_RELOAD_MMU >> notifier in mmu notifier you mentioned. Also KVM_REQ_DISABLE_IBS is a >> candidate. > > Btw. aren't those requests racy? > > void exit_sie(struct kvm_vcpu *vcpu) > { > atomic_or(CPUSTAT_STOP_INT, &vcpu->arch.sie_block->cpuflags); > > If you get stalled here and the target VCPU handles the request and > reenters SIE in the meantime, then you'll wait until its next exit. > (And miss an unbounded amount of exits in the worst case.) > > while (vcpu->arch.sie_block->prog0c & PROG_IN_SIE) > cpu_relax(); > } > Its not racy for the purpose it was originally made for (get the vcpu out of SIE before we unmap a guest prefix page) as the MMU_RELOAD handler will wait for the pte lock which is held by the code that called kvm_s390_sync_request(KVM_REQ_MMU_RELOAD, vcpu). We also have the guarantee that after returning from kvm_s390_sync_request we will have that request be handled before we reenter the guest, which is all we need for DISABLE_IBS. But yes, all non MMU_RELOAD users might wait longer, possibly several guest exits. We never noticed that as requests are really a seldom event. Basically unmapping of the guest prefix page due to paging and migration, switching between 1 and more guest cpus and some other seldom events. > And out of curiosity -- how many cycles does this loop usually take? > >> 2. Remote requests that don't need a sync >> >> E.g. KVM_REQ_ENABLE_IBS doesn't strictly need it, while >> KVM_REQ_DISABLE_IBS does. > > A usual KVM request would kick the VCPU out of nested virt as well. > Shouldn't it be done for these as well? > >> 3. local requests >> >> E.g. KVM_REQ_TLB_FLUSH from kvm_s390_set_prefix() >> >> >> Of course, having a unified interface would be better. >> >> /* set the request and kick the CPU out of guest mode */ >> kvm_set_request(req, vcpu); >> >> /* set the request, kick the CPU out of guest mode, wait until guest >> mode has been left and make sure the request will be handled before >> reentering guest mode */ >> kvm_set_sync_request(req, vcpu); > > Sounds good, I'll also add > > kvm_set_self_request(req, vcpu); > >> Same maybe even for multiple VCPUs (as there are then ways to speed it >> up, e.g. first kick all, then wait for all) >> >> This would require arch specific callbacks to >> 1. pre announce the request (e.g. set PROG_REQUEST on s390x) >> 2. kick the cpu (e.g. CPUSTAT_STOP_INT and later >> kvm_s390_vsie_kick(vcpu) on s390x) >> 3. check if still executing the guest (e.g. PROG_IN_SIE on s390x) >> >> This would only make sense if there are other use cases for sync >> requests. At least I remember that Power also has a faster way for >> kicking VCPUs, not involving SMP rescheds. I can't judge if this is a >> s390x only thing and is better be left as is :) >> >> At least vcpu_kick() could be quite easily made to work on s390x. >> >> Radim, are there also other users that need something like sync requests? > > I think that ARM has a similar need when updating vgic, but relies on an > asumption that VCPUs are going to be out after kicking them with > kvm_make_all_cpus_request(). > (vgic_change_active_prepare in virt/kvm/arm/vgic/vgic-mmio.c) > > Having synchronous requests in a common API should probably wait for the > completion of the request, not just for the kick, which would make race > handling simpler. > > I'm not going to worry about them in this pass, though. > > Thanks. >