Re: [RFC] Missing compound_head() in memory-failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/1/22 21:11, Jane Chu wrote:
> On 2/1/2022 7:46 AM, Matthew Wilcox wrote:
>> On Mon, Jan 31, 2022 at 08:54:39PM +0000, Joao Martins wrote:
>>> On 1/31/22 20:29, Matthew Wilcox wrote:
>>>> Unless I am mistaken, you have to pass the compound head of the page
>>>> which has the error to collect_procs().  Am I mistaken?
>>>>
>>> -rc2 already has a fix for it:
>>>
>>> https://lore.kernel.org/linux-mm/20220129021420.PgBIZm-q9%25akpm@xxxxxxxxxxxxxxxxxxxx/
>>>
>>> Earlier in that function there's a:
>>>
>>> 	page = compound_head(page);
>>>
>>> So the @page passed to collect_procs() already is a head page.
>>
>> It's wrong though ;-(  You set the HWPoison bit on the page after
>> calling compound_head(), so you set the bit on the head page instead
>> of the precise page that had the poison.
> 
> Indeed. The rest of the kernel including  pmem driver still deal with
> base page on clearing poison, bookkeeping etc. So the HWpoison bit needs 
> to be set precisely on the poisoned base page such that we pass the 
> correct 'pfn' to set_mce_nospec() to discourage speculative access.
> 
set_mce_nospec() machinery makes no use of the HWPoison bit as far as
my reading goes. And the PFN that is passed to set_mce_nospec() is already
the subpage PFN that eventually lands on set_memory_np()/set_memory_uc() when
it changes the kernel page tables mapping (which also don't use the poison bit).

I still can't see how device-dax machinery makes use of that bit? At least
the one which could use it (clear_mce_nospec()) doesn't actually go through
device-dax nvdimm-specific code only fsdax which I reiterate that the patch
does not change as there's no compound head there. Am I missing something?

>> I'm fixing this up as part of the folio patches, but you may wish to
>> fix it earlier than that.
> 
> Thanks for the fix!

As I had mentioned earlier I have one prepped, if this turns out to be
indeed a problem. So far, I haven't spotted any on my testing since I
started this work, but it could also be an oversight on my end.

	Joao




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux