On 19/12/17 14:18, Christoffer Dall wrote: > On Tue, Dec 19, 2017 at 01:55:25PM +0000, Marc Zyngier wrote: >> On 19/12/17 13:34, Christoffer Dall wrote: >>> On Wed, Dec 13, 2017 at 08:05:33PM +0000, Marc Zyngier wrote: >>>> On Wed, 13 Dec 2017 10:46:01 +0000, >>>> Christoffer Dall wrote: >>>>> >>>>> We currently check if the VM has a userspace irqchip on every exit from >>>>> the VCPU, and if so, we do some work to ensure correct timer behavior. >>>>> This is unfortunate, as we could avoid doing any work entirely, if we >>>>> didn't have to support irqchip in userspace. >>>>> >>>>> Realizing the userspace irqchip on ARM is mostly a developer or hobby >>>>> feature, and is unlikely to be used in servers or other scenarios where >>>>> performance is a priority, we can use a refcounted static key to only >>>>> check the irqchip configuration when we have at least one VM that uses >>>>> an irqchip in userspace. >>>>> >>>>> Reviewed-by: Eric Auger <eric.auger@xxxxxxxxxx> >>>>> Signed-off-by: Christoffer Dall <christoffer.dall@xxxxxxxxxx> >>>> >>>> On its own, this doesn't seem to be that useful. As far as I can see, >>>> it saves us a load from the kvm structure before giving up. >>> >>> A load and a conditional. But what I really wanted to also avoid was >>> the function call from the main run loop, which I neglected as well. I >>> think I can achieve that with a static inline wrapper in the arch timer >>> header file which first evaluates the static key and then calls into the >>> arch timer code. >>> >>> >>>> I think it >>>> is more the cumulative effect of this load that could have an impact, >>>> but you're only dealing with it at a single location. >>>> >>>> How about making this a first class helper and redefine >>>> irqchip_in_kernel as such: >>>> >>>> static inline bool irqchip_in_kernel(struct kvm *kvm) >>>> { >>>> if (static_branch_unlikely(&userspace_irqchip_in_use) && >>>> unlikely(!irqchip_in_kernel(kvm))) >>>> return true; >>>> >>>> return false; >>>> } >>>> >>>> and move that static key to a more central location? >>>> >>> >>> That's a neat idea. The only problem is that creating a new VM would >>> then flip the static key, and then we'd have to flip it back when a vgic >>> is created on that VM, and I don't particularly like the idea of doing >>> this too often. >> >> Fair enough. >> >>> >>> What I'd suggest then is to have two versions of the function: >>> irqchip_in_kernel() which is what it is today, and then >>> __irqchip_in_kernel() which can only be called from within the critical >>> path of the run loop, so that we can increment the static key on >>> kvm_vcpu_first_run_init() when we don't have a VGIC. >>> >>> How does that sound? >> >> OK, you only patch once per non-VGIC VM instead of twice per VGIC VM. >> But you now create a distinction between what can be used at runtime and >> what can be used at config time. The distinction is a bit annoying. >> >> Also, does this actually show up on the radar? >> > > Honestly, I don't know for this particular version of the patch. > > But when I did the VHE optimization work, which was before the userspace > irqchip support went in, getting rid of calling kvm_timer_sync_hwstate() > and the load+conditional in there (also prior to the level mapped > patches), was measurable, between 50 to 100 cycles. > > Of course, that turned out to be buggy when rebooting VMs, so I never > actually included that in my measurements, but it left me wanting to get > rid of this. > > It's a bit of a delicate balance. On the one hand, it's silly to try to > over-optimize, but on the other hand it's exactly the cumulative effect > of optimizing every bit that managed to get us good results on VHE. > > How about this: I write up the patch in the complicated version as part > of the next version, and if you think it's too difficult to maintain, we > can just drop it an apply the series without it? Sounds like a good plan. Thanks, M. -- Jazz is not dead. It just smells funny...