Re: srcu_cleanup warning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jun 08, 2024 at 08:43:25PM -0400, Kent Overstreet wrote:
> syzbot's been hitting this, and I just started hitting it in my CI (so I
> can finally repro), but I have no idea what's going on.
> 
> it seems the srcu struct is still in use? but there's no readers, we're
> not hitting that warning - and I've got my own asserts for leaking
> btree_trans objects, which are what take srcu read locks.
> 
> so there seems to be an outstanding grace period, but I can't follow the
> code to see how that's supposed to end - and adding an explicit
> srcu_barrier() to my shutdown path doesn't fix it...
> 
> 00458 ========= TEST   rereplicate
> 00458 
> 00458 WATCHDOG 60
> 00458 bcachefs (5c0b0c7e-1883-4f8a-abde-8f223546d087): mounting version 1.9: disk_accounting_v2
> 00458 bcachefs (5c0b0c7e-1883-4f8a-abde-8f223546d087): initializing new filesystem
> 00458 bcachefs (5c0b0c7e-1883-4f8a-abde-8f223546d087): going read-write
> 00458 bcachefs (5c0b0c7e-1883-4f8a-abde-8f223546d087): marking superblocks
> 00458 bcachefs (5c0b0c7e-1883-4f8a-abde-8f223546d087): initializing freespace
> 00458 bcachefs (5c0b0c7e-1883-4f8a-abde-8f223546d087): done initializing freespace
> 00458 bcachefs (5c0b0c7e-1883-4f8a-abde-8f223546d087): reading snapshots table
> 00458 bcachefs (5c0b0c7e-1883-4f8a-abde-8f223546d087): reading snapshots done
> 00458 bcachefs (5c0b0c7e-1883-4f8a-abde-8f223546d087): done starting filesystem
> 00459 1024+0 records in
> 00459 1024+0 records out
> 00459 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.927488 s, 1.2 GB/s
> 00459 bcachefs (5c0b0c7e-1883-4f8a-abde-8f223546d087): shutting down
> 00459 bcachefs (5c0b0c7e-1883-4f8a-abde-8f223546d087): going read-only
> 00459 bcachefs (5c0b0c7e-1883-4f8a-abde-8f223546d087): finished waiting for writes to stop
> 00459 bcachefs (5c0b0c7e-1883-4f8a-abde-8f223546d087): flushing journal and stopping allocators, journal seq 72
> 00459 bcachefs (5c0b0c7e-1883-4f8a-abde-8f223546d087): flushing journal and stopping allocators complete, journal seq 75
> 00459 bcachefs (5c0b0c7e-1883-4f8a-abde-8f223546d087): shutdown complete, journal seq 76
> 00459 bcachefs (5c0b0c7e-1883-4f8a-abde-8f223546d087): marking filesystem clean
> 00459 bcachefs (5c0b0c7e-1883-4f8a-abde-8f223546d087): shutdown complete
> 00459 ------------[ cut here ]------------
> 00459 WARNING: CPU: 1 PID: 1577 at kernel/rcu/srcutree.c:664 cleanup_srcu_struct+0x168/0x1c0
> 00459 Modules linked in:
> 00459 CPU: 1 PID: 1577 Comm: umount Not tainted 6.10.0-rc2-ktest-g45acc58a039a #23170
> 00459 Hardware name: linux,dummy-virt (DT)
> 00459 pstate: 00001005 (nzcv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
> 00459 pc : cleanup_srcu_struct+0x168/0x1c0
> 00459 lr : cleanup_srcu_struct+0x80/0x1c0
> 00459 sp : ffff0000c72efc10
> 00459 x29: ffff0000c72efc10 x28: ffff0000d8d8ae80 x27: 0000000000000000
> 00459 x26: 0000000000000010 x25: ffff0000f7183738 x24: ffff0000c1e73e00
> 00459 x23: ffff800080cc92a8 x22: ffff800080993010 x21: ffff0000f7183730
> 00459 x20: ffff800080cc8a30 x19: ffff0000fbdfeb40 x18: 00000000fffffffe
> 00459 x17: 656c706d6f63206e x16: 776f647475687320 x15: 0000000000000001
> 00459 x14: dead000000000122 x13: 0000000000000001 x12: 00000000f0000000
> 00459 x11: 0000000000000040 x10: ffff800080cdf538 x9 : ffff800080049624
> 00459 x8 : ffff0000c0400028 x7 : 0000000000000000 x6 : 0000000000000000
> 00459 x5 : ffff0000c0400000 x4 : 0000000000000000 x3 : 0000000000000000
> 00459 x2 : 0000000000000010 x1 : 0000000000000010 x0 : 000000000000023d
> 00459 Call trace:
> 00459  cleanup_srcu_struct+0x168/0x1c0
> 00459  bch2_fs_btree_iter_exit+0x1a0/0x1c8
> 00459  bch2_fs_release+0xf8/0x2f0
> 00459  kobject_put+0xa4/0xf0
> 00459  bch2_fs_free+0xf4/0x110
> 00459  bch2_kill_sb+0x24/0x38
> 00459  deactivate_locked_super+0x54/0xb0
> 00459  deactivate_super+0x70/0x90
> 00459  cleanup_mnt+0xa0/0x130
> 00459  __cleanup_mnt+0x18/0x28
> 00459  task_work_run+0xa8/0xf0
> 00459  do_notify_resume+0x120/0x130
> 00459  el0_svc+0x44/0x58
> 00459  el0t_64_sync_handler+0xb8/0xc0
> 00459  el0t_64_sync+0x14c/0x150
> 00459 ---[ end trace 0000000000000000 ]---
> 00459 rcu: cleanup_srcu_struct: Active srcu_struct 000000006c688d30 read state: 1 gp state: 573/576

There is a grace period in progress ("read state: 1") and that grace
period is the last one that has been requested ("gp state: 573/576").

Had there been callbacks pending, there would have been a warning from
"if (WARN_ON(rcu_segcblist_n_cbs(&sdp->srcu_cblist)))", so srcu_barrier()
having no effect is expected behavior.  Which also suggests that the
unfinished grace period was started by start_poll_synchronize_srcu().

Could you please try something like this just before the call to
cleanup_srcu_struct()?

	WARN_ON_ONCE(poll_state_synchronize_srcu(&c->btree_trans_barrier, ck->btree_trans_barrier_seq);

If there is some chance that start_poll_synchronize_srcu() was never
ever invoked, this check will of course need some additional help.

I am curious about your use of ULONG_CMP_GE() on return values from
different calls to start_poll_synchronize_srcu(), but that is not urgent.

							Thanx, Paul

> 00459 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070
> 00459 Mem abort info:
> 00459   ESR = 0x0000000096000004
> 00459   EC = 0x25: DABT (current EL), IL = 32 bits
> 00459   SET = 0, FnV = 0
> 00459   EA = 0, S1PTW = 0
> 00459   FSC = 0x04: level 0 translation fault
> 00459 Data abort info:
> 00459   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> 00459   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> 00459   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> 00459 user pgtable: 4k pages, 48-bit VAs, pgdp=000000011c75a000
> 00459 [0000000000000070] pgd=0000000000000000, p4d=0000000000000000
> 00459 Internal error: Oops: 0000000096000004 [#1] SMP
> 00459 Modules linked in:
> 00459 CPU: 0 PID: 1157 Comm: kworker/0:2 Tainted: G        W          6.10.0-rc2-ktest-g45acc58a039a #23170
> 00459 Hardware name: linux,dummy-virt (DT)
> 00459 Workqueue: rcu_gp srcu_invoke_callbacks
> 00459 pstate: 600010c5 (nZCv daIF -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
> 00459 pc : srcu_invoke_callbacks+0x74/0x210
> 00459 lr : srcu_invoke_callbacks+0x30/0x210
> 00459 sp : ffff00006ca5fd10
> 00459 x29: ffff00006ca5fd10 x28: ffff0000fbc5ac40 x27: ffff000005d71100
> 00459 x26: ffff800080cc6000 x25: ffff800080a98008 x24: ffff0000fbc5ab80
> 00459 x23: ffff0000fbc5ab40 x22: ffff0000fbc5ab88 x21: ffff0000fbc512c0
> 00459 x20: ffff0000fbc5ac40 x19: ffff0000f7183730 x18: 00000000e8d092ee
> 00459 x17: 000000006ba30620 x16: 00000000cf2823e1 x15: 00000000d2643b66
> 00459 x14: 0000000000000000 x13: ffff0000c17f9f00 x12: ffff0000c17f9da8
> 00459 x11: 0000000000000000 x10: 0000000000000940 x9 : ffff8000800a4968
> 00459 x8 : fefefefefefefeff x7 : 0000000000000018 x6 : ffff000005d71100
> 00459 x5 : 000070675f756372 x4 : 000070675f756372 x3 : 0000000000000000
> 00459 x2 : 0000000000000001 x1 : 0000000000000000 x0 : ffff0000fbc5ab88
> 00459 Call trace:
> 00459  srcu_invoke_callbacks+0x74/0x210
> 00459  process_scheduled_works+0x184/0x3d0
> 00459  worker_thread+0x174/0x318
> 00459  kthread+0xe8/0x100
> 00459  ret_from_fork+0x10/0x20
> 00459 Code: 54000be1 f9400a61 910122f6 aa1603e0 (f9403821) 
> 00459 ---[ end trace 0000000000000000 ]---
> 00459 Kernel panic - not syncing: Oops: Fatal exception
> 00459 SMP: stopping secondary CPUs
> 00459 Kernel Offset: disabled
> 00459 CPU features: 0x00,00000003,80000008,4240500b
> 00459 Memory Limit: none
> 00459 ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---
> 00464 ========= FAILED TIMEOUT rereplicate in 60s
> 




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux