On Wed, Jan 24, 2024 at 11:52:25AM -0400, Jason Gunthorpe wrote: > On Wed, Jan 24, 2024 at 01:32:22PM +0000, Marc Zyngier wrote: > > What I'm saying is that there are way to make it better without > > breaking your particular toy workload which, as important as it may be > > to *you*, doesn't cover everybody's use case. > > Please, do we need the "toy" stuff? The industry is spending 10's of > billions of dollars right now to run "my workload". Currently not > widely on ARM servers, but we are all hoping ARM can succeed here, > right? > > I still don't know what you mean by "better". There are several issues > now > > 1) This series, where WC doesn't trigger on new cores. Maybe 8x STR > will fix it, but it is not better performance wise than 4x STP. It would be good to know. If the performance difference is significant, we can revisit. I'm not keen on using alternatives here without backing it up by numbers (do we even have a way to detect whether Linux is running natively or not? we may have to invent something). > 2) Userspace does ST4 to MMIO memory, and the VMM can't explode > because of this. Replacing the ST4 with 8x STR is NOT better, > that would be a big performance downside, especially for the > quirky hi-silicon hardware. I was hoping KVM injects an error into the guest rather than killing it but at a quick look I couldn't find it. The kvm_handle_guest_abort() -> io_mem_abort() ends up returning -ENOSYS while handle_trap_exceptions() only understands handled or not (like 1 or 0). Well, maybe I didn't look deep enough. -- Catalin