On 01.06.17 12:20, Marc Zyngier wrote:
Some systems have less than perfect GICv3 implementations, leading to all kind of ugly issues (guest hanging, host dying). In order to allow some level of diagnostic, and in some cases implement workarounds, this series enables the trapping of both Group-0, Group-1 and Common sysregs. Mediating the access at EL2 allows some form of sanity checking that the HW is sometimes sorely lacking. Instead of fully emulating a GICv3 CPU interface, we still use the existing HW (list registers, AP registers, VMCR...), which allows the code to be independent from the rest of the KVM code, and to cope with partial trapping. Of course, trapping has a cost, which is why this must be either enabled on the command line, or selected by another cpu capability (see Cavium erratum 30115). A quick test on an A57-based platform shows a 25% hit when repeatedly banging on the trapped registers, while normal workloads do not seem to suffer noticeably from such trapping (hackbench variance is in the usual noise, despite being very IPI happy). This has been tested on a dual socket Thundex-X and a Freescale LS-2085a. I've taken the liberty to rebase David Daney's initial Cavium erratum 30115 workaround on top of this series, and included it here as a typical use case.
I've run this patch set on an affected ThunderX system and indeed not seen any hangs. I have seen lost guest USB keyboard events which might point at interrupt problems or not, but let's assume it's a different issue for now.
Tested-by: Alexander Graf <agraf@xxxxxxx> Alex