[Cc Oscar] On Fri 11-10-19 12:13:17, David Hildenbrand wrote: > On 11.10.19 08:02, Naoya Horiguchi wrote: > > On Thu, Oct 10, 2019 at 09:58:40AM +0200, David Hildenbrand wrote: > >> On 10.10.19 09:52, David Hildenbrand wrote: > >>> On 10.10.19 09:35, Michal Hocko wrote: > >>>> On Thu 10-10-19 09:27:32, David Hildenbrand wrote: > >>>>> On 09.10.19 16:43, Michal Hocko wrote: > >>>>>> On Wed 09-10-19 16:24:35, David Hildenbrand wrote: > >>>>>>> We should check for pfn_to_online_page() to not access uninitialized > >>>>>>> memmaps. Reshuffle the code so we don't have to duplicate the error > >>>>>>> message. > >>>>>>> > >>>>>>> Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> > >>>>>>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > >>>>>>> Cc: Michal Hocko <mhocko@xxxxxxxxxx> > >>>>>>> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> > >>>>>>> --- > >>>>>>> mm/memory-failure.c | 14 ++++++++------ > >>>>>>> 1 file changed, 8 insertions(+), 6 deletions(-) > >>>>>>> > >>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c > >>>>>>> index 7ef849da8278..e866e6e5660b 100644 > >>>>>>> --- a/mm/memory-failure.c > >>>>>>> +++ b/mm/memory-failure.c > >>>>>>> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags) > >>>>>>> if (!sysctl_memory_failure_recovery) > >>>>>>> panic("Memory failure on page %lx", pfn); > >>>>>>> > >>>>>>> - if (!pfn_valid(pfn)) { > >>>>>>> + p = pfn_to_online_page(pfn); > >>>>>>> + if (!p) { > >>>>>>> + if (pfn_valid(pfn)) { > >>>>>>> + pgmap = get_dev_pagemap(pfn, NULL); > >>>>>>> + if (pgmap) > >>>>>>> + return memory_failure_dev_pagemap(pfn, flags, > >>>>>>> + pgmap); > >>>>>>> + } > >>>>>>> pr_err("Memory failure: %#lx: memory outside kernel control\n", > >>>>>>> pfn); > >>>>>>> return -ENXIO; > >>>>>> > >>>>>> Don't we need that earlier at hwpoison_inject level? > >>>>>> > >>>>> > >>>>> Theoretically yes, this is another instance. But pfn_to_online_page(pfn) > >>>>> alone would not be sufficient as discussed. We would, again, have to > >>>>> special-case ZONE_DEVICE via things like get_dev_pagemap() ... > >>>>> > >>>>> But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either way: > >>>>> > >>>>> /* > >>>>> * Note that the below poison/unpoison interfaces do not involve > >>>>> * hardware status change, hence do not require hardware support. > >>>>> * They are mainly for testing hwpoison in software level. > >>>>> */ > >>>>> > >>>>> So it's not that bad compared to memory_failure() called from real HW or > >>>>> drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store() > >>>> > >>>> Yes, this is just a toy. And yes we need to handle zone device pages > >>>> here because a) people likely want to test MCE behavior even on these > >>>> pages and b) HW can really trigger MCEs there as well. I was just > >>>> pointing that the patch is likely incomplete. > >>>> > >>> > >>> I rather think this deserves a separate patch as it is a separate > >>> interface :) > >>> > >>> I do wonder why hwpoison_inject() has to perform so much extra work > >>> compared to other memory_failure() users. This smells like legacy > >>> leftovers to me, but I might be wrong. The interface is fairly old, > >>> though. Does anybody know why we need this magic? I can spot quite some > >>> duplicate checks/things getting performed. > > > > It concerns me too, this *is* an old legacy code. I guess it was left as-is > > because no one complained about it. That's not good, so I'll do some cleanup. > > Most of that stuff was introduced in > > commit 31d3d3484f9bd263925ecaa341500ac2df3a5d9b > Author: Wu Fengguang <fengguang.wu@xxxxxxxxx> > Date: Wed Dec 16 12:19:59 2009 +0100 > > HWPOISON: limit hwpoison injector to known page types > > __memory_failure()'s workflow is > > set PG_hwpoison > //... > unset PG_hwpoison if didn't pass hwpoison filter > > That could kill unrelated process if it happens to page fault on the > page with the (temporary) PG_hwpoison. The race should be big enough to > appear in stress tests. > > Fix it by grabbing the page and checking filter at inject time. This > also avoids the very noisy "Injecting memory failure..." messages. > > > Now, we still have the same "issue" in memory_failure() today: > > > if (TestSetPageHWPoison(p)) { > pr_err("Memory failure: %#lx: already hardware poisoned\n", > pfn); > return 0; > } > [...] > if (hwpoison_filter(p)) { > if (TestClearPageHWPoison(p)) > num_poisoned_pages_dec(); > unlock_page(p); > put_hwpoison_page(p); > return 0; > } > > However, I don't understand why we need that special handling only for this > debug interface and not the other users. > > I'd vote for ripping out that legacy crap (so the interface works correctly > with ZONE_DEVICE) and instead (if really required) rework memory_failure() > to not produce such side effects. I do agree. The two should be really using the same code. My understanding was that MADV_HWPOISON was there to test the actual MCE behavior (and the man page seems to agree with that). Oscar is working on a rewrite. Not sure he has considered this as well. -- Michal Hocko SUSE Labs