On Wed, 24 Jan 2024 13:06:38 +0000, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > On Wed, Jan 24, 2024 at 08:26:28AM +0000, Marc Zyngier wrote: > > > > Even if you refuse to take STP to mainline it *will* be running in VMs > > > under ARM hypervisors. > > > > A hypervisor can't do anything with it. If you cared to read the > > architecture, you'd know by now. So your VM will be either dead, or > > dog slow, depending on your hypervisor. In any case, I'm sure it will > > reflect positively on your favourite software. > > "Dog slow" is fine. Forcing IO emulation on paths that shouldn't have > it is a VMM problem. KVM & qemu have some issues where this can happen > infrequently for VFIO MMIO maps. It is just important that it be > functionally correct if you get unlucky. The performance path is to > not take a fault in the first place. > > > > What exactly do you think should be done about that? > > > > Well, you could use KVM_CAP_ARM_NISV_TO_USER in userspace and see > > everything slow down. Your call. > > The issue Mark raised here was that things like STP/etc cannot work in > VMs, not that they are slow. > > The places we are talking about using the STP pattern are all high > performance HW drivers, that do not have any existing SW emulation to > worry about. ie the VMM will be using VFIO to back the MMIO the > acessors target. > > So, I'm fine if the answer is that VMM's using VFIO need to use > KVM_CAP_ARM_NISV_TO_USER and do instruction parsing for emulated IO in > userspace if they have a design where VFIO MMIO can infrequently > generate faults. That is all VMM design stuff and has nothing to do > with the kernel. Which will work a treat with things like CCA, I'm sure. > > My objection is this notion we should degrade a performance hot path > in drivers to accomodate an ARM VMM issue that should be solved in the > VMM. > > > Or you can stop whining and try to get better performance out of what > > we have today. > > "better performance"!?!? You are telling me I have to destroy one of > our important fast paths for HPC workloads to accommodate some > theoretical ARM KVM problem? What I'm saying is that there are way to make it better without breaking your particular toy workload which, as important as it may be to *you*, doesn't cover everybody's use case. Mark did post such an example that has the potential of having that improvement. I'd suggest that you give it a go. But your attitude of "who cares if it breaks as long as it works for me" is not something I can adhere to. M. -- Without deviation from the norm, progress is not possible.