On Mon, 18 Jan 2021 08:57:47 +0000 HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@xxxxxxx> wrote: > > > > > > For action optional cases, one error event kills *only one* process. If an > > > error page are shared by multiple processes, these processes will be killed > > > by separate error events, each of which is triggered when each process tries > > > to access the error memory. So these processes would be killed immediately > > > when accessing the error, but you don't have to kill all at the same time > > > (or actually you might not even have to kill it at all if the process exits > > > finally without accessing the error later). > > > > > > Maybe the function variable "force_early" is named confusingly (it sounds > > > that it's related to PF_MCE_KILL_EARLY flag, but that's incorrect). > > > I'll submit a fix later. (I'll add your "Reported-by" because you made me > > > find it, thank you.) > > > > > I think we should do more for non current process error case, we should mark it AO for processes to be signaled > > or we may take wrong action. > > I'm not sure what you mean by "non current process error case" and "we > should mark it AO", so could you explain more specifically about your error > scenario? I will share my test code and i will submit another patch to this scenario. please give me some time, thanks! And I think you are right, AR is only current process. > Especially I'd like to know about who triggers hard offline on > what hardware events and what "wrong action" could happen. Maybe just > "calling memory_failure() with MF_ACTION_REQUIRED" is not enough, because > it's not enough for us to see that your scenario is possible. Current > implementation implicitly assumes some hardware behavior, and does not work > for the case which never happens under the assumption. > This action is from mcelog daemon, normally softpage offlie is default, but we can configure hardpage offline for CE storms, to get related processes signaled. > Do you have some test cases to reproduce any specific issue (like data lost) > on your system? (If yes, please share it.) Or your concern is from code review? > I will make it clean, get it shared Thanks -- Best Regards! Aili Yao