On Thu, Nov 11, 2021 at 08:24:42PM +0530, Naresh Kamboju wrote: > On Thu, 11 Nov 2021 at 18:32, Sudip Mukherjee > <sudipm.mukherjee@xxxxxxxxx> wrote: > > > > Hi Greg, > > > > On Wed, Nov 10, 2021 at 07:43:46PM +0100, Greg Kroah-Hartman wrote: > > > This is the start of the stable review cycle for the 5.10.79 release. > > > There are 21 patches in this series, all will be posted as a response > > > to this one. If anyone has any issues with these being applied, please > > > let me know. > > > > > > Responses should be made by Fri, 12 Nov 2021 18:19:54 +0000. > > > Anything received after that time might be too late. > > > > systemd-journal-flush.service failed due to a timeout resulting in a very very > > slow boot on my test laptop. qemu test on openqa failed due to the same problem. > > > > https://openqa.qa.codethink.co.uk/tests/365 > > > > A bisect showed the problem to be 8615ff6dd1ac ("mm: filemap: check if THP has > > hwpoisoned subpage for PMD page fault"). Reverting it on top of 5.10.79-rc1 > > fixed the problem. > > Incidentally, I was having similar problem with Linus's tree > > for last few days and was failing since 20211106 (did not get the time to check). > > I will test mainline again with this commit reverted. > > I have also noticed this problem and Anders bisected and found this > first bad commit. > > Failed test log link, > A start job is running for Journal Service (5s / 1min 27s) > https://lkft.validation.linaro.org/scheduler/job/3901980#L2234 > > Reported-by: Linux Kernel Functional Testing <lkft@xxxxxxxxxx> > > Bisect log: > > # bad: [b85617a6291f710807d0cd078c230626dee60b16] Linux 5.10.79-rc1 > # good: [5040520482a594e92d4f69141229a6dd26173511] Linux 5.10.78 > git bisect start 'b85617a6291f710807d0cd078c230626dee60b16' > '5040520482a594e92d4f69141229a6dd26173511' > # bad: [7ceeda856035991a6c9804916987a03759745fb0] staging: rtl8712: > fix use-after-free in rtl8712_dl_fw > git bisect bad 7ceeda856035991a6c9804916987a03759745fb0 > # bad: [8615ff6dd1ac9e01b6fcf0fc0652353f79f524ed] mm: filemap: check > if THP has hwpoisoned subpage for PMD page fault > git bisect bad 8615ff6dd1ac9e01b6fcf0fc0652353f79f524ed > # good: [e9cb6ce4690749d42013f1d56874c624d7241740] Revert "x86/kvm: > fix vcpu-id indexed array sizes" > git bisect good e9cb6ce4690749d42013f1d56874c624d7241740 > # good: [dc385dfc126d51d7a93db694f8e151afe60eb06a] mm: hwpoison: > remove the unnecessary THP check > git bisect good dc385dfc126d51d7a93db694f8e151afe60eb06a > # first bad commit: [8615ff6dd1ac9e01b6fcf0fc0652353f79f524ed] mm: > filemap: check if THP has hwpoisoned subpage for PMD page fault > commit 8615ff6dd1ac9e01b6fcf0fc0652353f79f524ed > Author: Yang Shi <shy828301@xxxxxxxxx> > Date: Thu Oct 28 14:36:11 2021 -0700 > > mm: filemap: check if THP has hwpoisoned subpage for PMD page fault > > commit eac96c3efdb593df1a57bb5b95dbe037bfa9a522 upstream. > > When handling shmem page fault the THP with corrupted subpage could be > PMD mapped if certain conditions are satisfied. But kernel is supposed > to send SIGBUS when trying to map hwpoisoned page. > > There are two paths which may do PMD map: fault around and regular > fault. > > Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() > codepaths") the thing was even worse in fault around path. The THP > could be PMD mapped as long as the VMA fits regardless what subpage is > accessed and corrupted. After this commit as long as head page is not > corrupted the THP could be PMD mapped. > > In the regular fault path the THP could be PMD mapped as long as the > corrupted page is not accessed and the VMA fits. > > This loophole could be fixed by iterating every subpage to check if any > of them is hwpoisoned or not, but it is somewhat costly in page fault > path. > > So introduce a new page flag called HasHWPoisoned on the first tail > page. It indicates the THP has hwpoisoned subpage(s). It is set if any > subpage of THP is found hwpoisoned by memory failure and after the > refcount is bumped successfully, then cleared when the THP is freed or > split. > > The soft offline path doesn't need this since soft offline handler just > marks a subpage hwpoisoned when the subpage is migrated successfully. > But shmem THP didn't get split then migrated at all. > > Link: https://lkml.kernel.org/r/20211020210755.23964-3-shy828301@xxxxxxxxx > Fixes: 800d8c63b2e9 ("shmem: add huge pages support") > Signed-off-by: Yang Shi <shy828301@xxxxxxxxx> > Reviewed-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > Suggested-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > Cc: Hugh Dickins <hughd@xxxxxxxxxx> > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > Cc: Oscar Salvador <osalvador@xxxxxxx> > Cc: Peter Xu <peterx@xxxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > > include/linux/page-flags.h | 23 +++++++++++++++++++++++ > mm/huge_memory.c | 2 ++ > mm/memory-failure.c | 14 ++++++++++++++ > mm/memory.c | 9 +++++++++ > mm/page_alloc.c | 4 +++- > 5 files changed, 51 insertions(+), 1 deletion(-) > Thanks, I'm going to go drop this patch again. This has been the second time we have tried to add it. Yang, are you _SURE_ it needs to be in the 5.10.y tree? So far it's been nothing but build and boot failures :( thanks, greg k-h