On Fri, May 10, 2024 at 03:29:48PM -0400, Peter Xu wrote: > IMHO we shouldn't mention that detail, but only state the effect which is > to not report the event to syslog. > > There's no hard rule that a pte marker can't reflect a real page poison in > the future even MCE. Actually I still remember most places don't care > about the pfn in the hwpoison swap entry so maybe we can even do it? But > that's another story regardless.. But we should not use pte markers for real hwpoisons events (aka MCE), right? I mean, we do have the means to mark a page as hwpoisoned when a real MCE gets triggered, why would we want a pte marker to also reflect that? Or is that something for userfaultd realm? > And also not report swapin error is, IMHO, only because arch errors said > "MCE" in the error logs which may not apply here. Logically speaking > swapin error should also be reported so admin knows better on why a proc is > killed. Now it can still confuse the admin if it really happens, iiuc. I am bit confused by this. It seems we create poisoned pte markers on swap errors (e.g: unuse_pte()), which get passed down the chain with VM_FAULT_HWPOISON, which end up in sigbus (I guess?). This all seems very subtle to me. First of all, why not passing VM_FAULT_SIGBUS if that is what will end up happening? I mean, at the moment that is not possible because we convolute swaping errors and uffd poison in the same type of marker, so we do not have any means to differentiate between the two of them. Would it make sense to create yet another pte marker type to split that up? Because when I look at VM_FAULT_HWPOISON, I get reminded of MCE stuff, and that does not hold here. -- Oscar Salvador SUSE Labs