On Tue, 11 Jul 2023 04:30:44 +0100, "Aiqun(Maria) Yu" <quic_aiquny@xxxxxxxxxxx> wrote: > > On 7/10/2023 5:31 PM, Marc Zyngier wrote: > > On Mon, 10 Jul 2023 09:19:54 +0100, > > "Aiqun(Maria) Yu" <quic_aiquny@xxxxxxxxxxx> wrote: > >> > >> On 7/10/2023 3:27 PM, Marc Zyngier wrote: > >>> On Mon, 10 Jul 2023 06:59:55 +0100, > >>> Maria Yu <quic_aiquny@xxxxxxxxxxx> wrote: > >>>> > >>>> In order to be able to disable lse_atomic even if cpu > >>>> support it, most likely because of memory controller > >>>> cannot deal with the lse atomic instructions, use a > >>>> new idreg override to deal with it. > >>> > >>> In general, the idreg overrides are *not* there to paper over HW bugs. > >>> They are there to force the kernel to use or disable a feature for > >>> performance reason or to guide the *enabling* of a feature, but not > >>> because the HW is broken. > >>> > >>> The broken status of a HW platform must also be documented so that we > >>> know what to expect when we look at, for example, a bad case of memory > >>> corruption (something I'd expect to see on a system that only > >>> partially implements atomic memory operations). > >>> > >> > >> good idea. A noc error would be happened if the lse atomic instruction > >> happened during a memory controller doesn't support lse atomic > >> instructions. > >> I can put the information in next patchset comment message. Pls feel > >> free to let know if there is other place to have this kind of > >> information with. > > > > For a start, Documentation/arch/arm64/silicon-errata.rst should > > contain an entry for the actual erratum, and a description of the > > symptoms of the issue (you're mentioning a "noc error": how is that > > reported to the CPU?). > > This is not a cpu's errata as my understanding. It is the DDR > subsystem which don't have the LSE atomic feature supported. CPU or not doesn't matter. We also track system errata. > > > > The workaround should also be detected at runtime -- we cannot rely on > > the user to provide a command-line argument to disable an essential > > feature that anyone has taken for granted for most of a decade... > > We are also seeking help from DDR Subsystem POC to see whether it is > possible to detect the LSE atomic feature support or not at runtime. Keying it off a DT compatible (or something similar) would work. > In my opinion, LSE atomic is a system level feature instead of a cpu > only feature. So currently solution we is that even if cpu support lse > atomic, but it still needed to be disabled if the cpu working with a > lse atomic not support by current system's DDR subsystem. In the absence of a detection mechanism for anything past the CPU, this is a moot point. At this stage, this is a bit like saying "writing to memory is a system thing, not only a CPU feature". And this also breaks KVM if these CPUs don't have FWB, as a guest can always map a piece of memory as non-cacheable, and trigger the issue you describe in your reply to Will, even if you hide the atomics on the host. M. -- Without deviation from the norm, progress is not possible.