On 17.04.23 13:12, Linux regression tracking (Thorsten Leemhuis) wrote: > On 14.03.23 11:17, Linux regression tracking (Thorsten Leemhuis) wrote: >> On 24.02.23 19:08, Mikhail Pletenv wrote: >>> I did some more testing on v6.1.12 and reproduced the issue. But i have >>> new bit of information: since the last time i've seen this issue i've >>> migrated most of my storage from XFS to BTRFS and i couldn't reproduce >>> the issue again today until i switched the source volume in the test >>> back to XFS. So it seems bug is either in the way that XFS talks to >>> mm/folios or is just triggered by it. >>> >>> anyway, i attached a report from v6.1.2 (seems to be happening in the >>> same place) >> >> Hi Willy! I'd like to bring this back onto your radar, as this >> regression is still unsolved afaics -- the patch you provided only >> partially helped. Or was progress to fix this made in a different thread >> and I just missed it? > > Willy, I know, I'm kinda annoying, but it's part of my job, hence please > allow me to ask: > > Do you still have this regression on your todo list somewhere? The > problem is now known and bisected since November. I understand that this > is not something that can be fixed quickly, but at the same time it's > quite a while already. > > Or has progress to fix this been made and I just it? Hmm, no reply. Does nobody care anymore or was this resolved and I just missed it? Mikhail Pletnev: is the problem still happening with latest mainline? Or deid you stop caring after you migrated your storage to btrfs? Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. >>> On 2/24/23 13:21, Linux regression tracking (Thorsten Leemhuis) wrote: >>>> On 16.12.22 06:23, Thorsten Leemhuis wrote: >>>>> Hi, this is your Linux kernel regression tracker. Top-posting for once, >>>>> to make this easily accessible to everyone. >>>> /me again >>>> >>>>> Was some progress made to get this regression resolved? From here it >>>>> looks kinda stalled, that's why I'm asking -- but maybe I just missed >>>>> something. >>>> Did anything happen to get this regression resolved? Doesn't look like >>>> it, but maybe I missed some progress. >>>> >>>> Willy, Mikhail confirmed off-list to me that the problem still exists. >>>> He also tried you patch and reported back. Is there something else you >>>> need? >>>> >>>> Side note: I lost this out of sight during the festive season and should >>>> have asked this earlier, but better late than never. :-D >>>> >>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) >>>> -- >>>> Everything you wanna know about Linux kernel regression tracking: >>>> https://linux-regtracking.leemhuis.info/about/#tldr >>>> If I did something stupid, please tell me, as explained on that page. >>>> >>>> #regzbot poke >>>> >>>>> On 06.12.22 03:08, Mikhail Pletnev wrote: >>>>>> On Mon, 5 Dec 2022 20:25:11 +0000 >>>>>> Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: >>>>>>> Thanks! I think this may be the problem ... >>>>>>> >>>>>> Hi Matthew, thanks for swift response, i've applied your last patch >>>>>> and ran my stress test a couple of times. It's still constistently >>>>>> crashing (albeit it seems in a different place): >>>>>> >>>>>> [ 1975.257126] ***BAD SIBLING*** index 912583 offset 4 >>>>>> [ 1975.257128] node ffff9fc817e01ff0 offset 51 parent >>>>>> ffff9fc5c7a31ff0 shift 0 count 64 values 48 array ffff9fc521173e80 >>>>>> list ffff9fc817e02008 ffff9fc817e02008 marks 0 0 0 >>>>>> [ 1975.257133] BUG: kernel NULL pointer dereference, address: >>>>>> 0000000000000036 >>>>>> [ 1975.257135] #PF: supervisor read access in kernel mode >>>>>> [ 1975.257137] #PF: error_code(0x0000) - not-present page >>>>>> [ 1975.257138] PGD 0 P4D 0 >>>>>> [ 1975.257139] Oops: 0000 [#1] PREEMPT SMP NOPTI >>>>>> [ 1975.257141] CPU: 5 PID: 8303 Comm: deluge-gtk Not tainted >>>>>> 5.17.0-rc4_ap_test-00163-g793917d997df-dirty #6 >>>>>> [ 1975.257144] Hardware name: Micro-Star International Co., Ltd. >>>>>> MS-7C35/MEG X570 UNIFY (MS-7C35), BIOS A.C3 03/15/2022 >>>>>> [ 1975.257146] RIP: 0010:__filemap_get_folio >>>>>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1899 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1951) >>>>>> [ 1975.257152] Code: 10 e8 56 fd 67 00 48 89 c3 48 3d 02 04 00 00 74 >>>>>> e2 48 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 >>>>>> 02 00 00 <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b >>>>>> 54 24 28 >>>>>> All code >>>>>> ======== >>>>>> 0: 10 e8 adc %ch,%al >>>>>> 2: 56 push %rsi >>>>>> 3: fd std >>>>>> 4: 67 00 48 89 add %cl,-0x77(%eax) >>>>>> 8: c3 ret >>>>>> 9: 48 3d 02 04 00 00 cmp $0x402,%rax >>>>>> f: 74 e2 je 0xfffffffffffffff3 >>>>>> 11: 48 3d 06 04 00 00 cmp $0x406,%rax >>>>>> 17: 74 da je 0xfffffffffffffff3 >>>>>> 19: 48 85 c0 test %rax,%rax >>>>>> 1c: 0f 84 3e 02 00 00 je 0x260 >>>>>> 22: a8 01 test $0x1,%al >>>>>> 24: 0f 85 40 02 00 00 jne 0x26a >>>>>> 2a:* 8b 40 34 mov 0x34(%rax),%eax >>>>>> <-- trapping instruction >>>>>> 2d: 85 c0 test %eax,%eax >>>>>> 2f: 74 c2 je 0xfffffffffffffff3 >>>>>> 31: 8d 50 01 lea 0x1(%rax),%edx >>>>>> 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >>>>>> 39: 75 f2 jne 0x2d >>>>>> 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >>>>>> >>>>>> Code starting with the faulting instruction >>>>>> =========================================== >>>>>> 0: 8b 40 34 mov 0x34(%rax),%eax >>>>>> 3: 85 c0 test %eax,%eax >>>>>> 5: 74 c2 je 0xffffffffffffffc9 >>>>>> 7: 8d 50 01 lea 0x1(%rax),%edx >>>>>> a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >>>>>> f: 75 f2 jne 0x3 >>>>>> 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >>>>>> [ 1975.257154] RSP: 0000:ffffc2d744c37cb0 EFLAGS: 00010246 >>>>>> [ 1975.257155] RAX: 0000000000000002 RBX: 0000000000000002 RCX: >>>>>> 0000000000000000 >>>>>> [ 1975.257156] RDX: 0000000000000000 RSI: ffffffffbb117459 RDI: >>>>>> 00000000ffffffff >>>>>> [ 1975.257157] RBP: 0000000000000000 R08: 00000000ffffdfff R09: >>>>>> 00000000ffffdfff >>>>>> [ 1975.257158] R10: ffffffffbb472dc0 R11: ffffffffbb472dc0 R12: >>>>>> 0000000000000000 >>>>>> [ 1975.257159] R13: ffff9fc521173e78 R14: 00000000000decc7 R15: >>>>>> fff000003fffffff >>>>>> [ 1975.257160] FS: 00007fb2137fe6c0(0000) GS:ffff9fcb7eb40000(0000) >>>>>> knlGS:0000000000000000 >>>>>> [ 1975.257161] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>> [ 1975.257162] CR2: 0000000000000036 CR3: 0000000164114000 CR4: >>>>>> 0000000000750ee0 >>>>>> [ 1975.257163] PKRU: 55555554 >>>>>> [ 1975.257163] Call Trace: >>>>>> [ 1975.257164] <TASK> >>>>>> [ 1975.257166] ? page_add_file_rmap >>>>>> (/home/reinhardt/dev-apps/kernel/linux/./include/linux/page-flags.h:195 /home/reinhardt/dev-apps/kernel/linux/mm/internal.h:440 /home/reinhardt/dev-apps/kernel/linux/mm/rmap.c:1270) >>>>>> [ 1975.257169] filemap_fault >>>>>> (/home/reinhardt/dev-apps/kernel/linux/./include/linux/pagemap.h:531 >>>>>> /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:3107) >>>>>> [ 1975.257172] __do_fault >>>>>> (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:3852) >>>>>> [ 1975.257174] __handle_mm_fault >>>>>> (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4169 >>>>>> /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4297 >>>>>> /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4555 >>>>>> /home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4690) >>>>>> [ 1975.257176] handle_mm_fault >>>>>> (/home/reinhardt/dev-apps/kernel/linux/mm/memory.c:4788) >>>>>> [ 1975.257178] do_user_addr_fault >>>>>> (/home/reinhardt/dev-apps/kernel/linux/./include/linux/sched/signal.h:404 /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1399) >>>>>> [ 1975.257181] exc_page_fault >>>>>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:40 /home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/irqflags.h:75 /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1492 /home/reinhardt/dev-apps/kernel/linux/arch/x86/mm/fault.c:1540) >>>>>> [ 1975.257184] ? asm_exc_page_fault >>>>>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) >>>>>> [ 1975.257186] asm_exc_page_fault >>>>>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/idtentry.h:568) >>>>>> [ 1975.257188] RIP: 0033:0x7fb265b88409 >>>>>> [ 1975.257189] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f >>>>>> 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 83 fa >>>>>> 20 72 27 <c5> fe 6f 06 48 83 fa 40 0f 87 a9 00 00 00 c5 fe 6f 4c 16 >>>>>> e0 c5 fe >>>>>> All code >>>>>> ======== >>>>>> 0: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) >>>>>> 7: 00 00 00 00 >>>>>> b: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) >>>>>> 12: 00 00 00 00 >>>>>> 16: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) >>>>>> 1d: 00 00 00 00 >>>>>> 21: 48 89 f8 mov %rdi,%rax >>>>>> 24: 48 83 fa 20 cmp $0x20,%rdx >>>>>> 28: 72 27 jb 0x51 >>>>>> 2a:* c5 fe 6f 06 vmovdqu (%rsi),%ymm0 <-- >>>>>> trapping instruction >>>>>> 2e: 48 83 fa 40 cmp $0x40,%rdx >>>>>> 32: 0f 87 a9 00 00 00 ja 0xe1 >>>>>> 38: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 >>>>>> 3e: c5 .byte 0xc5 >>>>>> 3f: fe .byte 0xfe >>>>>> >>>>>> Code starting with the faulting instruction >>>>>> =========================================== >>>>>> 0: c5 fe 6f 06 vmovdqu (%rsi),%ymm0 >>>>>> 4: 48 83 fa 40 cmp $0x40,%rdx >>>>>> 8: 0f 87 a9 00 00 00 ja 0xb7 >>>>>> e: c5 fe 6f 4c 16 e0 vmovdqu -0x20(%rsi,%rdx,1),%ymm1 >>>>>> 14: c5 .byte 0xc5 >>>>>> 15: fe .byte 0xfe >>>>>> [ 1975.257190] RSP: 002b:00007fb2137fd908 EFLAGS: 00010202 >>>>>> [ 1975.257191] RAX: 00007fb204012a80 RBX: 0000000000000000 RCX: >>>>>> 00007fb2137fda90 >>>>>> [ 1975.257192] RDX: 0000000000004000 RSI: 00007f9fddbb51c3 RDI: >>>>>> 00007fb204012a80 >>>>>> [ 1975.257193] RBP: 00007fb2137fd928 R08: 00000000638ea1ab R09: >>>>>> 0000000000000000 >>>>>> [ 1975.257193] R10: 0000000000000008 R11: 0000000000000246 R12: >>>>>> 00007fb204000bb0 >>>>>> [ 1975.257194] R13: 00007fb21809a5a0 R14: 00000000decc71c3 R15: >>>>>> 0000000000004000 >>>>>> [ 1975.257196] </TASK> >>>>>> [ 1975.257196] Modules linked in: overlay xt_addrtype amdgpu >>>>>> drm_ttm_helper ttm gpu_sched drm_kms_helper iwlmvm backlight >>>>>> syscopyarea mac80211 sysfillrect sysimgblt libarc4 fb_sys_fops >>>>>> iwlwifi cfg80211 i2c_piix4 k10temp fuse configfs efivarfs >>>>>> [ 1975.257207] CR2: 0000000000000036 >>>>>> [ 1975.257208] ---[ end trace 0000000000000000 ]--- >>>>>> [ 1975.257209] RIP: 0010:__filemap_get_folio >>>>>> (/home/reinhardt/dev-apps/kernel/linux/./arch/x86/include/asm/atomic.h:29 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1158 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-arch-fallback.h:1183 /home/reinhardt/dev-apps/kernel/linux/./include/linux/atomic/atomic-instrumented.h:608 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:238 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:247 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:280 /home/reinhardt/dev-apps/kernel/linux/./include/linux/page_ref.h:313 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1899 /home/reinhardt/dev-apps/kernel/linux/mm/filemap.c:1951) >>>>>> [ 1975.257211] Code: 10 e8 56 fd 67 00 48 89 c3 48 3d 02 04 00 00 74 >>>>>> e2 48 3d 06 04 00 00 74 da 48 85 c0 0f 84 3e 02 00 00 a8 01 0f 85 40 >>>>>> 02 00 00 <8b> 40 34 85 c0 74 c2 8d 50 01 f0 0f b1 53 34 75 f2 48 8b >>>>>> 54 24 28 >>>>>> All code >>>>>> ======== >>>>>> 0: 10 e8 adc %ch,%al >>>>>> 2: 56 push %rsi >>>>>> 3: fd std >>>>>> 4: 67 00 48 89 add %cl,-0x77(%eax) >>>>>> 8: c3 ret >>>>>> 9: 48 3d 02 04 00 00 cmp $0x402,%rax >>>>>> f: 74 e2 je 0xfffffffffffffff3 >>>>>> 11: 48 3d 06 04 00 00 cmp $0x406,%rax >>>>>> 17: 74 da je 0xfffffffffffffff3 >>>>>> 19: 48 85 c0 test %rax,%rax >>>>>> 1c: 0f 84 3e 02 00 00 je 0x260 >>>>>> 22: a8 01 test $0x1,%al >>>>>> 24: 0f 85 40 02 00 00 jne 0x26a >>>>>> 2a:* 8b 40 34 mov 0x34(%rax),%eax >>>>>> <-- trapping instruction >>>>>> 2d: 85 c0 test %eax,%eax >>>>>> 2f: 74 c2 je 0xfffffffffffffff3 >>>>>> 31: 8d 50 01 lea 0x1(%rax),%edx >>>>>> 34: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >>>>>> 39: 75 f2 jne 0x2d >>>>>> 3b: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >>>>>> >>>>>> Code starting with the faulting instruction >>>>>> =========================================== >>>>>> 0: 8b 40 34 mov 0x34(%rax),%eax >>>>>> 3: 85 c0 test %eax,%eax >>>>>> 5: 74 c2 je 0xffffffffffffffc9 >>>>>> 7: 8d 50 01 lea 0x1(%rax),%edx >>>>>> a: f0 0f b1 53 34 lock cmpxchg %edx,0x34(%rbx) >>>>>> f: 75 f2 jne 0x3 >>>>>> 11: 48 8b 54 24 28 mov 0x28(%rsp),%rdx >>>>>> [ 1975.257212] RSP: 0000:ffffc2d744c37cb0 EFLAGS: 00010246 >>>>>> [ 1975.257213] RAX: 0000000000000002 RBX: 0000000000000002 RCX: >>>>>> 0000000000000000 >>>>>> [ 1975.257214] RDX: 0000000000000000 RSI: ffffffffbb117459 RDI: >>>>>> 00000000ffffffff >>>>>> [ 1975.257215] RBP: 0000000000000000 R08: 00000000ffffdfff R09: >>>>>> 00000000ffffdfff >>>>>> [ 1975.257215] R10: ffffffffbb472dc0 R11: ffffffffbb472dc0 R12: >>>>>> 0000000000000000 >>>>>> [ 1975.257216] R13: ffff9fc521173e78 R14: 00000000000decc7 R15: >>>>>> fff000003fffffff >>>>>> [ 1975.257217] FS: 00007fb2137fe6c0(0000) GS:ffff9fcb7eb40000(0000) >>>>>> knlGS:0000000000000000 >>>>>> [ 1975.257218] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>> [ 1975.257219] CR2: 0000000000000036 CR3: 0000000164114000 CR4: >>>>>> 0000000000750ee0 >>>>>> [ 1975.257220] PKRU: 55555554 >>>>>> >>>>>> (full dmesg and my local changeset in attachments for your reference) >>>>>> >>>>> #regzbot poke >>>>> >> >>