On Wed, 24 Jan 2024 01:27:23 +0000, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > On Tue, Jan 23, 2024 at 08:38:55PM +0000, Catalin Marinas wrote: > > (fixed Marc's email address) > > > > On Wed, Jan 17, 2024 at 01:29:06PM +0000, Mark Rutland wrote: > > > On Wed, Jan 17, 2024 at 08:36:18AM -0400, Jason Gunthorpe wrote: > > > > On Wed, Jan 17, 2024 at 12:30:00PM +0000, Mark Rutland wrote: > > > > > On Tue, Jan 16, 2024 at 02:51:21PM -0400, Jason Gunthorpe wrote: > > > > > > I'm just revising this and I'm wondering if you know why ARM64 has this: > > > > > > > > > > > > #define __raw_writeq __raw_writeq > > > > > > static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr) > > > > > > { > > > > > > asm volatile("str %x0, [%1]" : : "rZ" (val), "r" (addr)); > > > > > > } > > > > > > > > > > > > Instead of > > > > > > > > > > > > #define __raw_writeq __raw_writeq > > > > > > static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr) > > > > > > { > > > > > > asm volatile("str %x0, %1" : : "rZ" (val), "m" (*(volatile u64 *)addr)); > > > > > > } > > > > > > > > > > > > ?? Like x86 has. > > > > > > > > > > I believe this is for the same reason as doing so in all of our other IO > > > > > accessors. > > > > > > > > > > We've deliberately ensured that our IO accessors use a single base register > > > > > with no offset as this is the only form that HW can represent in ESR_ELx.ISS.SRT > > > > > when reporting a stage-2 abort, which a hypervisor may use for > > > > > emulating IO. > > > > > > > > Wow, harming bare metal performace to accommodate imperfect emulation > > > > sounds like a horrible reason :( > > > > > > Having working functionality everywhere is a very good reason. :) > > > > > > > So what happens with this patch where IO is done with STP? Are you > > > > going to tell me I can't do it because of this? > > > > > > I'm not personally going to make that judgement, but it's certainly something > > > for Catalin and Will to consider (and I've added Marc in case he has any > > > opinion). > > > > Good point, I missed this part. We definitely can't use STP in the I/O > > accessors, we'd have a big surprise when running the same code in a > > guest with emulated I/O. > > Unfortunately there is no hard distinction in KVM/qemu for "emulated > IO" and "VFIO MMIO". Even devices using VFIO can get funneled down the > emulated path for legitimate reasons. > > Again, userspace is already widely deployed using complex IO > accessors. ST4 has been out there for years and at this moment this > patch with STP is already being deployed in production environments. Then you will get to keep the pieces. Good luck. > Even if you refuse to take STP to mainline it *will* be running in VMs > under ARM hypervisors. A hypervisor can't do anything with it. If you cared to read the architecture, you'd know by now. So your VM will be either dead, or dog slow, depending on your hypervisor. In any case, I'm sure it will reflect positively on your favourite software. > What exactly do you think should be done about that? Well, you could use KVM_CAP_ARM_NISV_TO_USER in userspace and see everything slow down. Your call. > I thought the guiding mantra here was that any time KVM does not > perfectly emulate bare metal it is a bug. "We can't assume all VMs are > Linux!". Indeed we recently had some long and *very* theoretical > discussions about possible incompatibilties due to kvm changes in the > memory attributes thread. > > But here it seems to be just shrugging off something so catastrophic > as performance IO accessors *that are widely deployed already* don't > work reliably in VMs!?!? > > "Oh well, don't use them"!? Exactly. You can also take this to the ARM architects and get them to update the architecture to mandate full syndrome information for all load/store instructions, and you'll get something useful in 2034. Maybe. Or you can stop whining and try to get better performance out of what we have today. > Damn I hope it crashes the VM and doesn't corrupt the MMIO. I just > debugged a x86 KVM issue with it corrupting VFIO MMIO and that was a > total nightmare to find. > > > If eight STRs without other operations interleaved give us the > > write-combining on most CPUs (with Normal NC), we should go with this > > instead of STP. > > __iowrite64_copy() is a performance IO accessor, we should not degrade > it because buggy hypervisors might exist that have a problem with STP > or other instructions. :( :( > > Anyhow, I know nothing about whatever this issue is - Mark said: > > > FWIW, IIUC the immediate-offset forms *without* writeback can still > > be reported usefully in ESR_ELx, > > Which excludes the post/pre increment forms - but does STP and ST4 > also have some kind of problem because the emulation path can't know > about wider than a 64 bit access? > > What is the plan for ST64B? Don't get to use that either? ST64 has full syndrome information, making it possible to emulate. In any case, there is no magic there. Everything is documented, and has been for the past... 15 years? M. -- Without deviation from the norm, progress is not possible.