On 2/1/22 21:11, Jane Chu wrote: > On 2/1/2022 7:46 AM, Matthew Wilcox wrote: >> On Mon, Jan 31, 2022 at 08:54:39PM +0000, Joao Martins wrote: >>> On 1/31/22 20:29, Matthew Wilcox wrote: >>>> Unless I am mistaken, you have to pass the compound head of the page >>>> which has the error to collect_procs(). Am I mistaken? >>>> >>> -rc2 already has a fix for it: >>> >>> https://lore.kernel.org/linux-mm/20220129021420.PgBIZm-q9%25akpm@xxxxxxxxxxxxxxxxxxxx/ >>> >>> Earlier in that function there's a: >>> >>> page = compound_head(page); >>> >>> So the @page passed to collect_procs() already is a head page. >> >> It's wrong though ;-( You set the HWPoison bit on the page after >> calling compound_head(), so you set the bit on the head page instead >> of the precise page that had the poison. > > Indeed. The rest of the kernel including pmem driver still deal with > base page on clearing poison, bookkeeping etc. So the HWpoison bit needs > to be set precisely on the poisoned base page such that we pass the > correct 'pfn' to set_mce_nospec() to discourage speculative access. > set_mce_nospec() machinery makes no use of the HWPoison bit as far as my reading goes. And the PFN that is passed to set_mce_nospec() is already the subpage PFN that eventually lands on set_memory_np()/set_memory_uc() when it changes the kernel page tables mapping (which also don't use the poison bit). I still can't see how device-dax machinery makes use of that bit? At least the one which could use it (clear_mce_nospec()) doesn't actually go through device-dax nvdimm-specific code only fsdax which I reiterate that the patch does not change as there's no compound head there. Am I missing something? >> I'm fixing this up as part of the folio patches, but you may wish to >> fix it earlier than that. > > Thanks for the fix! As I had mentioned earlier I have one prepped, if this turns out to be indeed a problem. So far, I haven't spotted any on my testing since I started this work, but it could also be an oversight on my end. Joao