Re: [PATCH 5.10 00/21] 5.10.79-rc1 review

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 11, 2021 at 08:24:42PM +0530, Naresh Kamboju wrote:
> On Thu, 11 Nov 2021 at 18:32, Sudip Mukherjee
> <sudipm.mukherjee@xxxxxxxxx> wrote:
> >
> > Hi Greg,
> >
> > On Wed, Nov 10, 2021 at 07:43:46PM +0100, Greg Kroah-Hartman wrote:
> > > This is the start of the stable review cycle for the 5.10.79 release.
> > > There are 21 patches in this series, all will be posted as a response
> > > to this one.  If anyone has any issues with these being applied, please
> > > let me know.
> > >
> > > Responses should be made by Fri, 12 Nov 2021 18:19:54 +0000.
> > > Anything received after that time might be too late.
> >
> > systemd-journal-flush.service failed due to a timeout resulting in a very very
> > slow boot on my test laptop. qemu test on openqa failed due to the same problem.
> >
> > https://openqa.qa.codethink.co.uk/tests/365
> >
> > A bisect showed the problem to be 8615ff6dd1ac ("mm: filemap: check if THP has
> > hwpoisoned subpage for PMD page fault"). Reverting it on top of 5.10.79-rc1
> > fixed the problem.
> > Incidentally, I was having similar problem with Linus's tree
> > for last few days and was failing since 20211106 (did not get the time to check).
> > I will test mainline again with this commit reverted.
> 
> I have also noticed this problem and Anders bisected and found this
> first bad commit.
> 
> Failed test log link,
> A start job is running for Journal Service (5s / 1min 27s)
> https://lkft.validation.linaro.org/scheduler/job/3901980#L2234
> 
> Reported-by: Linux Kernel Functional Testing <lkft@xxxxxxxxxx>
> 
> Bisect log:
> 
> # bad: [b85617a6291f710807d0cd078c230626dee60b16] Linux 5.10.79-rc1
> # good: [5040520482a594e92d4f69141229a6dd26173511] Linux 5.10.78
> git bisect start 'b85617a6291f710807d0cd078c230626dee60b16'
> '5040520482a594e92d4f69141229a6dd26173511'
> # bad: [7ceeda856035991a6c9804916987a03759745fb0] staging: rtl8712:
> fix use-after-free in rtl8712_dl_fw
> git bisect bad 7ceeda856035991a6c9804916987a03759745fb0
> # bad: [8615ff6dd1ac9e01b6fcf0fc0652353f79f524ed] mm: filemap: check
> if THP has hwpoisoned subpage for PMD page fault
> git bisect bad 8615ff6dd1ac9e01b6fcf0fc0652353f79f524ed
> # good: [e9cb6ce4690749d42013f1d56874c624d7241740] Revert "x86/kvm:
> fix vcpu-id indexed array sizes"
> git bisect good e9cb6ce4690749d42013f1d56874c624d7241740
> # good: [dc385dfc126d51d7a93db694f8e151afe60eb06a] mm: hwpoison:
> remove the unnecessary THP check
> git bisect good dc385dfc126d51d7a93db694f8e151afe60eb06a
> # first bad commit: [8615ff6dd1ac9e01b6fcf0fc0652353f79f524ed] mm:
> filemap: check if THP has hwpoisoned subpage for PMD page fault
> commit 8615ff6dd1ac9e01b6fcf0fc0652353f79f524ed
> Author: Yang Shi <shy828301@xxxxxxxxx>
> Date:   Thu Oct 28 14:36:11 2021 -0700
> 
>     mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
> 
>     commit eac96c3efdb593df1a57bb5b95dbe037bfa9a522 upstream.
> 
>     When handling shmem page fault the THP with corrupted subpage could be
>     PMD mapped if certain conditions are satisfied.  But kernel is supposed
>     to send SIGBUS when trying to map hwpoisoned page.
> 
>     There are two paths which may do PMD map: fault around and regular
>     fault.
> 
>     Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault()
>     codepaths") the thing was even worse in fault around path.  The THP
>     could be PMD mapped as long as the VMA fits regardless what subpage is
>     accessed and corrupted.  After this commit as long as head page is not
>     corrupted the THP could be PMD mapped.
> 
>     In the regular fault path the THP could be PMD mapped as long as the
>     corrupted page is not accessed and the VMA fits.
> 
>     This loophole could be fixed by iterating every subpage to check if any
>     of them is hwpoisoned or not, but it is somewhat costly in page fault
>     path.
> 
>     So introduce a new page flag called HasHWPoisoned on the first tail
>     page.  It indicates the THP has hwpoisoned subpage(s).  It is set if any
>     subpage of THP is found hwpoisoned by memory failure and after the
>     refcount is bumped successfully, then cleared when the THP is freed or
>     split.
> 
>     The soft offline path doesn't need this since soft offline handler just
>     marks a subpage hwpoisoned when the subpage is migrated successfully.
>     But shmem THP didn't get split then migrated at all.
> 
>     Link: https://lkml.kernel.org/r/20211020210755.23964-3-shy828301@xxxxxxxxx
>     Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
>     Signed-off-by: Yang Shi <shy828301@xxxxxxxxx>
>     Reviewed-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
>     Suggested-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
>     Cc: Hugh Dickins <hughd@xxxxxxxxxx>
>     Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
>     Cc: Oscar Salvador <osalvador@xxxxxxx>
>     Cc: Peter Xu <peterx@xxxxxxxxxx>
>     Cc: <stable@xxxxxxxxxxxxxxx>
>     Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>     Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>     Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> 
>  include/linux/page-flags.h | 23 +++++++++++++++++++++++
>  mm/huge_memory.c           |  2 ++
>  mm/memory-failure.c        | 14 ++++++++++++++
>  mm/memory.c                |  9 +++++++++
>  mm/page_alloc.c            |  4 +++-
>  5 files changed, 51 insertions(+), 1 deletion(-)
> 

Thanks, I'm going to go drop this patch again.

This has been the second time we have tried to add it.  Yang, are you
_SURE_ it needs to be in the 5.10.y tree?  So far it's been nothing but
build and boot failures :(

thanks,

greg k-h



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux