Hi, next release cycle, and another regression. Yesterday after another kernel update in Fedora Rawhide system stopped booting. Today thanks to git bisect, I found out that this is a commit: ❯ git bisect bad a349d72fd9efc87c8fd1d16d3164752d84a7275b is the first bad commit commit a349d72fd9efc87c8fd1d16d3164752d84a7275b Author: Hugh Dickins <hughd@xxxxxxxxxx> Date: Tue Jul 11 21:30:40 2023 -0700 mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s Patch series "mm: free retracted page table by RCU", v3. Some mmap_lock avoidance i.e. latency reduction. Initially just for the case of collapsing shmem or file pages to THPs: the usefulness of MADV_COLLAPSE on shmem is being limited by that mmap_write_lock it currently requires. Likely to be relied upon later in other contexts e.g. freeing of empty page tables (but that's not work I'm doing). mmap_write_lock avoidance when collapsing to anon THPs? Perhaps, but again that's not work I've done: a quick attempt was not as easy as the shmem/file case. These changes (though of course not these exact patches) have been in Google's data centre kernel for three years now: we do rely upon them. This patch (of 13): Before putting them to use (several commits later), add rcu_read_lock() to pte_offset_map(), and rcu_read_unlock() to pte_unmap(). Make this a separate commit, since it risks exposing imbalances: prior commits have fixed all the known imbalances, but we may find some have been missed. Link: https://lkml.kernel.org/r/7cd843a9-aa80-14f-5eb2-33427363c20@xxxxxxxxxx Link: https://lkml.kernel.org/r/d3b01da5-2a6-833c-6681-67a3e024a16f@xxxxxxxxxx Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> Cc: Alexander Gordeev <agordeev@xxxxxxxxxxxxx> Cc: Alistair Popple <apopple@xxxxxxxxxx> Cc: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx> Cc: Anshuman Khandual <anshuman.khandual@xxxxxxx> Cc: Axel Rasmussen <axelrasmussen@xxxxxxxxxx> Cc: Christian Borntraeger <borntraeger@xxxxxxxxxxxxx> Cc: Christophe Leroy <christophe.leroy@xxxxxxxxxx> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> Cc: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx> Cc: David Hildenbrand <david@xxxxxxxxxx> Cc: "David S. Miller" <davem@xxxxxxxxxxxxx> Cc: Gerald Schaefer <gerald.schaefer@xxxxxxxxxxxxx> Cc: Heiko Carstens <hca@xxxxxxxxxxxxx> Cc: Huang, Ying <ying.huang@xxxxxxxxx> Cc: Ira Weiny <ira.weiny@xxxxxxxxx> Cc: Jann Horn <jannh@xxxxxxxxxx> Cc: Jason Gunthorpe <jgg@xxxxxxxx> Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Cc: Lorenzo Stoakes <lstoakes@xxxxxxxxx> Cc: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> Cc: Miaohe Lin <linmiaohe@xxxxxxxxxx> Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx> Cc: Mike Rapoport (IBM) <rppt@xxxxxxxxxx> Cc: Minchan Kim <minchan@xxxxxxxxxx> Cc: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> Cc: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> Cc: Peter Xu <peterx@xxxxxxxxxx> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Cc: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx> Cc: Ralph Campbell <rcampbell@xxxxxxxxxx> Cc: Russell King <linux@xxxxxxxxxxxxxxx> Cc: SeongJae Park <sj@xxxxxxxxxx> Cc: Song Liu <song@xxxxxxxxxx> Cc: Steven Price <steven.price@xxxxxxx> Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx> Cc: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> Cc: Vasily Gorbik <gor@xxxxxxxxxxxxx> Cc: Vishal Moola (Oracle) <vishal.moola@xxxxxxxxx> Cc: Vlastimil Babka <vbabka@xxxxxxx> Cc: Will Deacon <will@xxxxxxxxxx> Cc: Yang Shi <shy828301@xxxxxxxxx> Cc: Yu Zhao <yuzhao@xxxxxxxxxx> Cc: Zack Rusin <zackr@xxxxxxxxxx> Cc: Zi Yan <ziy@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> include/linux/pgtable.h | 4 ++-- mm/pgtable-generic.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) It looks like the hang happens so early that when booting into a working kernel and running "journalctl -b -1" I see in the console the log of the previous kernel which was booted before the problematic kernel. Therefore, I apologize that I can't provide the kernel logs. I can provides only photos when backtrace appears on my monitor: Here we waiting: https://ibb.co/5xmm0BH And then I see backtrace: https://ibb.co/TLLGFNP Unfortunately I can't revert commit a349d72fd9efc87c8fd1d16d3164752d84a7275b for testing more fresh builds because of conflicts. My hardware: https://linux-hardware.org/?probe=dd5735f315 I also attached kernel build config and full bisect log. -- Best Regards, Mike Gavrilov.
Attachment:
.config.zip
Description: Zip archive
<<attachment: git-bisect-system-not-boot.zip>>