Re: [PATCH v4 1/3] mm/hwpoison: mf_mutex for soft offline and unpoison

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 21, 2022 at 10:59:14AM -0800, Mike Kravetz wrote:
> On 1/20/22 16:09, Mike Kravetz wrote:
> > On 11/15/21 00:40, Naoya Horiguchi wrote:
> >> From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
> >>
> >> Originally mf_mutex is introduced to serialize multiple MCE events, but
> >> it is not that useful to allow unpoison to run in parallel with memory_failure()
> >> and soft offline.  So apply mf_mutex to soft offline and unpoison.
> >> The memory failure handler and soft offline handler get simpler with this.
> >>
> 
> Sorry for the late question.
> 
> It is not directly part of this change, but can we also remove the check in
> __soft_offline_page after this comment?
> 
>         /*
>          * Check PageHWPoison again inside page lock because PageHWPoison
>          * is set by memory_failure() outside page lock. Note that
>          * memory_failure() also double-checks PageHWPoison inside page lock,
>          * so there's no race between soft_offline_page() and memory_failure().
>          */

You're right. This comment is obsolete. I'll send a patch later.

And you pointed out some important points in the previous email, thank you.

> There are only two places other places where PageHWPoison is modified without
> the mutex.  They are:
> - test_and_clear_pmem_poison
>   I 'think' pmem error handling is done separately so this does not apply.

Yes, soft-offline nor unpoison should work for pmem error, so the relevant
race should not be problematic.

> - clear_hwpoisoned_pages
>   Called before removing memory (and deleting memmap) to reconcile count
>   of poisoned pages.  Should not be an issue and technically I do not think
>   the ClearPageHWPoison() is actually needed in this routine.

Right, this ClearPageHWPoison() might be unnecessary because struct pages
are freed soon after.  But this function also updates num_poisoned_pages,
and the race could break the counter.  I'll check the effect of the race
with hotremove and how mf_mutex can solve it.

Thanks,
Naoya Horiguchi




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux