On Fri, Sep 25, 2020 at 09:54:06AM +0900, Punit Agrawal wrote: > Borislav Petkov <bp@xxxxxxxxx> writes: > > > On Thu, Sep 24, 2020 at 12:23:27PM -0500, Smita Koralahalli Channabasappa wrote: > >> > Even though it's not defined in the UEFI spec, it doesn't mean a > >> > structure definition cannot be created. > > > > Created for what? That structure better have a big fat comment above it, what > > firmware generates its layout. > > Maybe I could've used a better choice of words - I meant to define a > structure with meaningful member names to replace the *(ptr + i) > accesses in the patch. > > The requirement for documenting the record layout doesn't change - > whether using raw pointer arithmetic vs a structure definition. > > >> > After all, the patch is relying on some guarantee of the meaning of > >> > the values and their ordering. > > > > AFAICT, this looks like an ad-hoc definition and the moment they change > > it in some future revision, that struct of yours becomes invalid so we'd > > need to add another one. > > If there's no spec backing the current layout, then it'll indeed be an > ad-hoc definition of a structure in the kernel. But considering that > it's part of firmware / OS interface for an important part of the RAS > story I would hope that the code is based on a spec - having that > reference included would help maintainability. > > Incompatible changes will indeed break the assumptions in the kernel and > code will need to be updated - regardless of the choice of kernel > implementation; pointer arithmetic, structure definition - ad-hoc or > spec provided. > > Having versioning will allow running older kernels on newer hardware and > vice versa - but I don't see why that is important only when using a > structure based access. > There is no versioning option for the x86 context info structure in the UEFI spec, so I don't think there'd be a clean way to include version information. The format of the data in the context info is not totally ad-hoc, and it does follow the UEFI spec. The "Register Array" field is raw data. This may follow one of the predefined formats in the UEFI spec like the "X64 Register State", etc. Or, in the case of MSR and Memory Mapped Registers, this is a raw dump of the registers starting from the address shown in the structure. The two values that can be changed are the starting address and the array size. These two together provide a window to the registers. The registers are fixed, so a single context info struture should include a single contiguous range of registers. Multiple context info structures can be provided to include registers from different, non-contiguous ranges. This patch is checking if an MSR context info structure lines up with the MCAX register space used on Scalable MCA systems. This register space is defined in the AMD Processor Programming Reference for various products. This is considered a hardware feature extension, so the existing register layout won't change though new registers may be added. A layout change would require moving to another register space which is what happened going from legacy MCA (starting at address 0x400) to MCAX (starting at address 0xC0002000) registers. The only two things firmware can change are from what address does the info start and where does the info end. So the implementation-specific details here are that currently the starting address is MCA_STATUS (in MCAX space) for a bank and the remaining info includes the other MCA registers for this bank. So I think the kernel can be strict with this format, i.e. the two variables match what we're looking for. This patch already has a check on the starting address. It should also include a check that "Register Array Size" is large enough to include all the registers we want to extract. If the format doesn't match, then we fall back to a raw dump of the data like we have today. Or the kernel can be more flexible and try to find the window of registers based on the starting address. I think this is really open-ended though. Does this sound reasonable? Thanks, Yazen