On Fri, Apr 23, 2021 at 01:57:25PM +0200, Borislav Petkov wrote: > On Fri, Apr 23, 2021 at 02:18:34AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote: > > I don't know exactly. MCE subsystem seems to have code extracting linear > > address, so I wonder that that could be used as a hint to memory_failure() > > to find the proper virtual address. > > See "Table 15-3. Address Mode in IA32_MCi_MISC[8:6]" in the SDM - > apparently it can report all kinds of address types, depending on the hw > incarnation or MCA bank type or whatnot. Tony knows :) "15.9.3.2 Architecturally Defined SRAR Errors" says that the register is supposed to have physical address. For both the data load and instruction fetch errors, the ADDRV and MISCV flags in the IA32_MCi_STATUS register are set to indicate that the offending physical address information is available from the IA32_MCi_MISC and the IA32_MCi_ADDR registers. > > The situation in question is caused by action required MCE, so > > we know which process we should send SIGBUS to. So if we choose > > to send SIGBUS to all, no innocent bystanders would be affected. > > But when the process have multiple virtual addresses associated > > with the error physical address, the process receives multiple > > SIGBUSs and all but one have wrong value in si_addr in siginfo_t, > > so that's confusing. > > Is that scenario real or hypothetical? > > Because I'd expect that if we send it a SIGBUS and we poison that page, > then all the VAs mapping it will have to handle the situation that that > page has been poisoned and pulled from under them. IIUC, the above should be done by the first MCE handling. In "already hwpoisoned" case, the page has already been poisoned and all mapping for it should be already unmapped, then what we need additionally is to send SIGBUS to report to the application that it should take some action or abort immediately. Thanks, Naoya Horiguchi