Re: [PATCH 5.10 098/138] zram: fix uninitialized ZRAM not releasing backing device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dominique Martinet wrote on Tue, Jan 07, 2025 at 11:40:30AM +0900:
> I'll look a bit more into it today and reply to this mail with anything
> I've found.
> (didn't test on master or anything else either)

So:
- master has no problem
- 5.10.233-rc1, 5.15.176-rc1, 6.1.124-rc1 have the same issue
- 6.6.70-rc1 is also fine
- My previous mail lacked stacktrace decoding, here's a new backtrace
with proper decoding on 6.1.124-rc1, produced by virtme-ng +
decode_stacktrace.sh (end of the mail)
vng -e 'dmesg -C; echo 1 | sudo tee /sys/class/block/zram0/reset || dmesg'
- looking at said backtrace, a likely difference would be the
multi-stream rework, in particular commit 7ac07a26dea7 ("zram:
preparation for multi-zcomp support") that changed how freeing works.
... and cherry-picking it on 6.1 does fix the issue.
Unfortunately it doesn't cherry-pick cleanly to 5.15 and 5.10, so for
these two I'm not sure if it's better to just drop this "zram: fix
uninitialized ZRAM not releasing backing device" commit, or if we should
try harder to backport prerequisites (e55e1b483156 ("block: move from
strlcpy with unused retval to strscpy") is an obvious one that is not
too hard to pick even if not clean, but that wasn't enough and I didn't
try further).

I've at least checked that dropping this patch is enough, and will not
do anything else on this for now.




(nothing aside of the symbolized backtrace below)
-----------------

[    2.184091] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    2.184094] #PF: supervisor read access in kernel mode
[    2.184096] #PF: error_code(0x0000) - not-present page
[    2.184098] PGD 0 P4D 0
[    2.184101] Oops: 0000 [#1] PREEMPT SMP NOPTI
[    2.184104] CPU: 2 PID: 650 Comm: tee Not tainted 6.1.124-rc1+ #5
[    2.184107] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[    2.184109] RIP: 0010:zcomp_cpu_dead (drivers/block/zram/zcomp.c:171) 
[ 2.184115] Code: ff 31 c0 48 c7 c7 c8 4c 9b a1 48 89 43 08 48 89 03 e8 ac d1 2c 00 ba f4 ff ff ff eb bb 0f 1f 44 00 00 0f 1f 44 00 00 89 ff 53 <48> 8b 5e f0 48 03 1c fd 20 ea 9e a1 48 8b 7b 08 48 85 ff 74 11 48
All code
========
   0:	ff 31                	push   (%rcx)
   2:	c0 48 c7 c7          	rorb   $0xc7,-0x39(%rax)
   6:	c8 4c 9b a1          	enter  $0x9b4c,$0xa1
   a:	48 89 43 08          	mov    %rax,0x8(%rbx)
   e:	48 89 03             	mov    %rax,(%rbx)
  11:	e8 ac d1 2c 00       	call   0x2cd1c2
  16:	ba f4 ff ff ff       	mov    $0xfffffff4,%edx
  1b:	eb bb                	jmp    0xffffffffffffffd8
  1d:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  22:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  27:	89 ff                	mov    %edi,%edi
  29:	53                   	push   %rbx
  2a:*	48 8b 5e f0          	mov    -0x10(%rsi),%rbx		<-- trapping instruction
  2e:	48 03 1c fd 20 ea 9e 	add    -0x5e6115e0(,%rdi,8),%rbx
  35:	a1 
  36:	48 8b 7b 08          	mov    0x8(%rbx),%rdi
  3a:	48 85 ff             	test   %rdi,%rdi
  3d:	74 11                	je     0x50
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	48 8b 5e f0          	mov    -0x10(%rsi),%rbx
   4:	48 03 1c fd 20 ea 9e 	add    -0x5e6115e0(,%rdi,8),%rbx
   b:	a1 
   c:	48 8b 7b 08          	mov    0x8(%rbx),%rdi
  10:	48 85 ff             	test   %rdi,%rdi
  13:	74 11                	je     0x26
  15:	48                   	rex.W
[    2.184117] RSP: 0018:ffffb556409ffd20 EFLAGS: 00010246
[    2.184120] RAX: ffffffffa0e09620 RBX: 0000000000000000 RCX: ffffffffa1c604c0
[    2.184122] RDX: 0000000000000000 RSI: 0000000000000010 RDI: 0000000000000000
[    2.184124] RBP: 0000000000000044 R08: 0000000000000000 R09: 000000000000000a
[    2.184125] R10: ffff9e20fe61b360 R11: 0fffffffffffffff R12: 0000000000000010
[    2.184127] R13: ffff9e20fe61b360 R14: ffffffffa0e09620 R15: 0000000000000000
[    2.184130] FS:  00007f4ffda3c740(0000) GS:ffff9e20fe680000(0000) knlGS:0000000000000000
[    2.184134] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.184136] CR2: 0000000000000000 CR3: 0000000005362000 CR4: 0000000000750ee0
[    2.184138] PKRU: 55555554
[    2.184139] Call Trace:
[    2.184141]  <TASK>
[    2.184144] ? __die_body.cold (arch/x86/kernel/dumpstack.c:478 arch/x86/kernel/dumpstack.c:465 arch/x86/kernel/dumpstack.c:420) 
[    2.184149] ? page_fault_oops (arch/x86/mm/fault.c:727) 
[    2.184153] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:131) 
[    2.184158] ? kernfs_iop_setattr (fs/kernfs/inode.c:137) 
[    2.184163] ? exc_page_fault (arch/x86/include/asm/irqflags.h:40 arch/x86/include/asm/irqflags.h:75 arch/x86/mm/fault.c:1439 arch/x86/mm/fault.c:1487) 
[    2.184168] ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:608) 
[    2.184172] ? zcomp_cpu_up_prepare (drivers/block/zram/zcomp.c:167) 
[    2.184176] ? zcomp_cpu_up_prepare (drivers/block/zram/zcomp.c:167) 
[    2.184179] ? zcomp_cpu_dead (drivers/block/zram/zcomp.c:171) 
[    2.184182] cpuhp_invoke_callback (kernel/cpu.c:202) 
[    2.184187] cpuhp_issue_call (kernel/cpu.c:2016) 
[    2.184190] __cpuhp_state_remove_instance (kernel/cpu.c:2224) 
[    2.184193] zcomp_destroy (drivers/block/zram/zcomp.c:197) 
[    2.184196] zram_reset_device (drivers/block/zram/zram_drv.c:1737) 
[    2.184199] reset_store (drivers/pci/pci-sysfs.c:1387) 
[    2.184204] kernfs_fop_write_iter (fs/kernfs/file.c:338) 
[    2.184208] vfs_write (include/linux/fs.h:2265 fs/read_write.c:491 fs/read_write.c:584) 
[    2.184214] ksys_write (fs/read_write.c:637) 
[    2.184217] do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:81) 
[    2.184220] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121) 
[    2.184224] RIP: 0033:0x7f4ffdb372c0
[ 2.184227] Code: 40 00 48 8b 15 41 9b 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d 21 23 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
All code
========
   0:	40 00 48 8b          	rex add %cl,-0x75(%rax)
   4:	15 41 9b 0d 00       	adc    $0xd9b41,%eax
   9:	f7 d8                	neg    %eax
   b:	64 89 02             	mov    %eax,%fs:(%rdx)
   e:	48 c7 c0 ff ff ff ff 	mov    $0xffffffffffffffff,%rax
  15:	eb b7                	jmp    0xffffffffffffffce
  17:	0f 1f 00             	nopl   (%rax)
  1a:	80 3d 21 23 0e 00 00 	cmpb   $0x0,0xe2321(%rip)        # 0xe2342
  21:	74 17                	je     0x3a
  23:	b8 01 00 00 00       	mov    $0x1,%eax
  28:	0f 05                	syscall
  2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
  30:	77 58                	ja     0x8a
  32:	c3                   	ret
  33:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  3a:	48 83 ec 28          	sub    $0x28,%rsp
  3e:	48                   	rex.W
  3f:	89                   	.byte 0x89

Code starting with the faulting instruction
===========================================
   0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
   6:	77 58                	ja     0x60
   8:	c3                   	ret
   9:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  10:	48 83 ec 28          	sub    $0x28,%rsp
  14:	48                   	rex.W
  15:	89                   	.byte 0x89
[    2.184229] RSP: 002b:00007ffd10d35a78 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[    2.184232] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f4ffdb372c0
[    2.184233] RDX: 0000000000000002 RSI: 00007ffd10d35b90 RDI: 0000000000000003
[    2.184235] RBP: 00007ffd10d35b90 R08: 0000000000000004 R09: 0000000000000001
[    2.184236] R10: 00007f4ffda53f18 R11: 0000000000000202 R12: 0000000000000002
[    2.184238] R13: 0000559c792ae310 R14: 0000000000000002 R15: 00007f4ffdc0d9e0
[    2.184243]  </TASK>
[    2.184244] Modules linked in:
[    2.184247] CR2: 0000000000000000
[    2.184248] ---[ end trace 0000000000000000 ]---
[    2.184250] RIP: 0010:zcomp_cpu_dead (drivers/block/zram/zcomp.c:171) 
[ 2.184253] Code: ff 31 c0 48 c7 c7 c8 4c 9b a1 48 89 43 08 48 89 03 e8 ac d1 2c 00 ba f4 ff ff ff eb bb 0f 1f 44 00 00 0f 1f 44 00 00 89 ff 53 <48> 8b 5e f0 48 03 1c fd 20 ea 9e a1 48 8b 7b 08 48 85 ff 74 11 48
All code
========
   0:	ff 31                	push   (%rcx)
   2:	c0 48 c7 c7          	rorb   $0xc7,-0x39(%rax)
   6:	c8 4c 9b a1          	enter  $0x9b4c,$0xa1
   a:	48 89 43 08          	mov    %rax,0x8(%rbx)
   e:	48 89 03             	mov    %rax,(%rbx)
  11:	e8 ac d1 2c 00       	call   0x2cd1c2
  16:	ba f4 ff ff ff       	mov    $0xfffffff4,%edx
  1b:	eb bb                	jmp    0xffffffffffffffd8
  1d:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  22:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  27:	89 ff                	mov    %edi,%edi
  29:	53                   	push   %rbx
  2a:*	48 8b 5e f0          	mov    -0x10(%rsi),%rbx		<-- trapping instruction
  2e:	48 03 1c fd 20 ea 9e 	add    -0x5e6115e0(,%rdi,8),%rbx
  35:	a1 
  36:	48 8b 7b 08          	mov    0x8(%rbx),%rdi
  3a:	48 85 ff             	test   %rdi,%rdi
  3d:	74 11                	je     0x50
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	48 8b 5e f0          	mov    -0x10(%rsi),%rbx
   4:	48 03 1c fd 20 ea 9e 	add    -0x5e6115e0(,%rdi,8),%rbx
   b:	a1 
   c:	48 8b 7b 08          	mov    0x8(%rbx),%rdi
  10:	48 85 ff             	test   %rdi,%rdi
  13:	74 11                	je     0x26
  15:	48                   	rex.W
[    2.184254] RSP: 0018:ffffb556409ffd20 EFLAGS: 00010246
[    2.184256] RAX: ffffffffa0e09620 RBX: 0000000000000000 RCX: ffffffffa1c604c0
[    2.184258] RDX: 0000000000000000 RSI: 0000000000000010 RDI: 0000000000000000
[    2.184259] RBP: 0000000000000044 R08: 0000000000000000 R09: 000000000000000a
[    2.184261] R10: ffff9e20fe61b360 R11: 0fffffffffffffff R12: 0000000000000010
[    2.184262] R13: ffff9e20fe61b360 R14: ffffffffa0e09620 R15: 0000000000000000
[    2.184266] FS:  00007f4ffda3c740(0000) GS:ffff9e20fe680000(0000) knlGS:0000000000000000
[    2.184269] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.184271] CR2: 0000000000000000 CR3: 0000000005362000 CR4: 0000000000750ee0
[    2.184273] PKRU: 55555554
[    2.184274] note: tee[650] exited with irqs disabled


Thanks,
-- 
Dominique




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux