Re: [PATCH v2 1/1] arch/fault: don't print logs for pte marker poison errors

Oscar Salvador <osalvador@xxxxxxx> · Tue, 14 May 2024 22:26:49 +0200

On Fri, May 10, 2024 at 03:29:48PM -0400, Peter Xu wrote:
> IMHO we shouldn't mention that detail, but only state the effect which is
> to not report the event to syslog.
> 
> There's no hard rule that a pte marker can't reflect a real page poison in
> the future even MCE.  Actually I still remember most places don't care
> about the pfn in the hwpoison swap entry so maybe we can even do it? But
> that's another story regardless..

But we should not use pte markers for real hwpoisons events (aka MCE), right?
I mean, we do have the means to mark a page as hwpoisoned when a real
MCE gets triggered, why would we want a pte marker to also reflect that?
Or is that something for userfaultd realm?

> And also not report swapin error is, IMHO, only because arch errors said
> "MCE" in the error logs which may not apply here.  Logically speaking
> swapin error should also be reported so admin knows better on why a proc is
> killed.  Now it can still confuse the admin if it really happens, iiuc.

I am bit confused by this.
It seems we create poisoned pte markers on swap errors (e.g:
unuse_pte()), which get passed down the chain with VM_FAULT_HWPOISON,
which end up in sigbus (I guess?).

This all seems very subtle to me.

First of all, why not passing VM_FAULT_SIGBUS if that is what will end
up happening?
I mean, at the moment that is not possible because we convolute swaping
errors and uffd poison in the same type of marker, so we do not have any
means to differentiate between the two of them.

Would it make sense to create yet another pte marker type to split that
up? Because when I look at VM_FAULT_HWPOISON, I get reminded of MCE
stuff, and that does not hold here.

-- 
Oscar Salvador
SUSE Labs