Hi Marc, On 4/30/21 6:47 AM, Marc Zyngier wrote: > >>>> We've two concerns here: >>>> - Performance impacts for pass-through devices. >>>> - The definition of ioremap_wc() function doesn't match the host >>>> kernel on ARM64 >>> Performance I can understand, but I think you're also using it to mask >>> a driver bug which should be resolved first. Thank >> We’ve already instrumented the driver code and found the code path >> for the unaligned accesses. We’ll fix this issue if it’s not >> following WC semantics. >> >> Fixing the performance concern will be under KVM stage-2 page-table >> control. We're looking for a guidance/solution for updating stage-2 >> PTE based on PCI-BAR attribute. > Before we start discussing the *how*, I'd like to clearly understand > what *arm64* memory attributes you are relying on. We already have > established that the unaligned access was a bug, which was the biggest > argument in favour of NORMAL_NC. What are the other requirements? Sorry, my earlier response was not complete... ARMv8 architecture has two features Gathering and Reorder transactions, very important from a performance point of view. Small inline packets for NIC cards and accesses to GPU's frame buffer are CPU-bound operations. We want to take advantages of GRE features to achieve higher performance. Both these features are disabled for prefetchable BARs in VM because memory-type MT_DEVICE_nGnRE enforced in stage-2. > Thanks, > > M. > > -- > Without deviation from the norm, progress is not possible.