Re: [PATCH Part2 RFC v4 09/40] x86/fault: Add support to dump RMP entry on fault

Dave Hansen <dave.hansen@xxxxxxxxx> · Thu, 8 Jul 2021 10:15:24 -0700

On 7/8/21 10:11 AM, Brijesh Singh wrote:
> On 7/8/21 11:58 AM, Dave Hansen wrote:>> Logically its going to be
> tricky to figure out which exact entry caused
>>> the fault, hence I dump any non-zero entry. I understand it may dump
>>> some useless.
>>
>> What's tricky about it?
>>
>> Sure, there's a possibility that more than one entry could contribute to
>> a fault.  But, you always know *IF* an entry could contribute to a fault.
>>
>> I'm fine if you run through the logic, don't find a known reason
>> (specific RMP entry) for the fault, and dump the whole table in that
>> case.  But, unconditionally polluting the kernel log with noise isn't
>> very nice for debugging.
> 
> The tricky part is to determine which undocumented bit to check to know
> that we should stop dump. I can go with your suggestion that first try
> with the known reasons and fallback to dump whole table for unknown
> reasons only.

You *can't* stop because of undocumented bits.  Fundamentally.  You
literally don't know if the bit means "this caused a fault" versus "this
definitely couldn't cause a fault".

Basically, if we get to the point of dumping the whole table, we should
also spit out an error message saying that the kernel is dazed and
confused and can't figure out why the hardware caused a fault.  Then,
dump out the whole table so that the "hardware" folks can have a look.