Re: bcachefs RCU warnings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 25, 2023 at 02:08:45PM -0400, Kent Overstreet wrote:
> Hey Paul, RCU folks :)
> 
> I've got no idea what these warnings mean and haven't seen them before,
> do you suppose one of you could point me in the right direction?
> 
> On Wed, Oct 25, 2023 at 09:28:09PM +0800, Daniel J Blueman wrote:
> > Hi Kent et al,
> > 
> > On 6.6.0-rc7-next-20231024 with my bcachefs exerciser [1], I was able
> > to reproduce three related kernel warnings due to RCU grace period
> > counts being zero, ie WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0).
> > 
> > If this is something of interest, I'll find a minimal reproducer.
> > These warnings aside, bcachefs is looking really solid.
> > 
> > Thanks,
> >   Daniel
> > 
> > -- [1] https://github.com/dblueman/bcachefs-gym
> > 
> > -- [2]
> > 
> > WARNING: CPU: 15 PID: 259240 at kernel/rcu/sync.c:171 rcu_sync_exit+0xe3/0xf0

The usual cause would be mismatched rcu_sync_enter() and rcu_sync_exit(),
as in one more rcu_sync_exit() than rcu_sync_enter()...

> > Modules linked in: brd tls cfg80211 intel_rapl_msr intel_rapl_common
> > amd64_edac edac_mce_amd kvm_amd binfmt_misc kvm irqbypass ipmi_ssif
> > rapl wmi_bmof nls_iso8859_1 ccp ptdma k10temp acpi_ipmi ipmi_si
> > ipmi_devintf ipmi_msghandler input_leds joydev mac_hid efi_pstore
> > dmi_sysfs ip_tables x_tables autofs4 rndis_host cdc_ether usbnet m
> > ii btrfs blake2b_generic hid_generic usbhid raid10 hid raid456
> > async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0
> > multipath linear crct10dif_pclmul crc32_pclmul ast polyval_clmulni
> > polyval_generic i2c_algo_bit ghash_clmulni_intel drm_shmem_helper
> > sha512_ssse3 drm_kms_helper sha256_ssse3 sha1_ssse3 nvme ahci drm
> > nvme_core tg3 l
> > ibahci xhci_pci i2c_piix4 xhci_pci_renesas wmi aesni_intel crypto_simd
> > cryptd [last unloaded: brd]
> > CPU: 15 PID: 259240 Comm: bch-data/3edb8b Tainted: G W
> > 6.6.0-rc7-next-20231024 #1
> > Hardware name: Supermicro AS -3014TS-i/H12SSL-i, BIOS 2.5 09/08/2022
> > RIP: 0010:rcu_sync_exit+0xe3/0xf0
> > Code: c6 e0 06 c7 b2 e8 dd 0e 01 00 4c 89 e7 e8 b5 54 91 01 5b 41 5c
> > 41 5d 5d 31 c0 31 f6 31 ff e9 8f 35 a9 01 0f 0b e9 3d ff ff ff <0f> 0b
> > e9 4d ff ff ff 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90
> > RSP: 0018:ffffc90024127b60 EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: ffff8883b7383740 RCX: 0000000000000000
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > RBP: ffffc90024127b78 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffff8883b7383798
> > R13: ffff8883b7383744 R14: ffff8883b7383740 R15: ffff8883b7380498
> > FS: 0000000000000000(0000) GS:ffff88bf0e780000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007fea708bae00 CR3: 000000034b860003 CR4: 0000000000770ef0
> > PKRU: 55555554
> > Call Trace:
> > <TASK>
> > ? show_regs+0x6c/0x80
> > ? __warn+0xa4/0x1c0
> > ? rcu_sync_exit+0xe3/0xf0
> > ? report_bug+0x1bc/0x1f0
> > ? handle_bug+0x46/0x90
> > ? exc_invalid_op+0x18/0x50
> > ? asm_exc_invalid_op+0x1b/0x20
> > ? rcu_sync_exit+0xe3/0xf0
> > percpu_up_write+0x4d/0x60

...correspond to one more percpu_up_write() than percpu_down_write().

If it is possible to reproduce this easily, one way to get more
information would be to run with lockdep.

> > bch2_replicas_gc_end+0x60/0x110
> > bch2_journal_flush_device_pins+0x2d7/0x4b0
> > ? __entry_text_end+0x102619/0x10261d
> > ? __pfx_bch2_journal_flush_device_pins+0x10/0x10
> > ? __pfx_scnprintf+0x10/0x10
> > ? srso_alias_return_thunk+0x5/0xfbef5
> > ? raw_spin_rq_unlock+0x17/0x70
> > ? srso_alias_return_thunk+0x5/0xfbef5
> > ? finish_task_switch.isra.0+0xc1/0x410
> > ? __switch_to+0x281/0x6d0
> > bch2_data_job+0x1fb/0x500
> > ? __pfx_bch2_data_job+0x10/0x10
> > ? srso_alias_return_thunk+0x5/0xfbef5
> > ? try_to_wake_up+0x4b1/0xca0
> > ? srso_alias_return_thunk+0x5/0xfbef5
> > ? __pfx_bch2_data_thread+0x10/0x10
> > bch2_data_thread+0x60/0xa0
> > kthread+0x188/0x1d0
> > ? __pfx_kthread+0x10/0x10
> > ret_from_fork+0x47/0x80
> > ? __pfx_kthread+0x10/0x10
> > ret_from_fork_asm+0x1a/0x30
> > </TASK>
> > 
> > WARNING: CPU: 15 PID: 259240 at kernel/rcu/sync.c:129 rcu_sync_enter+0x1d6/0x200

In which case this would be a consequence of the above imbalance.

> > Modules linked in: brd tls cfg80211 intel_rapl_msr intel_rapl_common
> > amd64_edac edac_mce_amd kvm_amd binfmt_misc kvm irqbypass ipmi_ssif
> > rapl wmi_bmof nls_iso8859_1 ccp ptdma k10temp acpi_ipmi ipmi_si
> > ipmi_devintf ipmi_msghandler input_leds joydev mac_hid efi_pstore
> > dmi_sysfs ip_tables x_tables autofs4 rndis_host cdc_ether usbnet m
> > ii btrfs blake2b_generic hid_generic usbhid raid10 hid raid456
> > async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0
> > multipath linear crct10dif_pclmul crc32_pclmul ast polyval_clmulni
> > polyval_generic i2c_algo_bit ghash_clmulni_intel drm_shmem_helper
> > sha512_ssse3 drm_kms_helper sha256_ssse3 sha1_ssse3 nvme ahci drm
> > nvme_core tg3 l
> > ibahci xhci_pci i2c_piix4 xhci_pci_renesas wmi aesni_intel crypto_simd
> > cryptd [last unloaded: brd]
> > CPU: 15 PID: 259240 Comm: bch-data/3edb8b Tainted: G W
> > 6.6.0-rc7-next-20231024 #1
> > Hardware name: Supermicro AS -3014TS-i/H12SSL-i, BIOS 2.5 09/08/2022
> > RIP: 0010:rcu_sync_enter+0x1d6/0x200
> > Code: 89 ef e8 2d fb f9 ff 48 89 df e8 35 45 3f 00 8b 03 83 f8 01 7e
> > d8 48 8d b5 68 ff ff ff 4c 89 ef e8 6f c1 f9 ff e9 0b ff ff ff <0f> 0b
> > 4c 89 ff 41 83 c6 01 e8 ec 45 3f 00 44 89 73 04 4c 89 ef e8
> > RSP: 0018:ffffc90024127b00 EFLAGS: 00010086
> > RAX: 0000000000000000 RBX: ffff8883b7383740 RCX: 0000000000000000
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > RBP: ffffc90024127bc8 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff92004824f60
> > R13: ffff8883b7383748 R14: 00000000ffffffff R15: ffff8883b7383744
> > FS: 0000000000000000(0000) GS:ffff88bf0e780000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007fea708bae00 CR3: 000000034b860003 CR4: 0000000000770ef0
> > PKRU: 55555554
> > Call Trace:
> > <TASK>
> > ? show_regs+0x6c/0x80
> > ? __warn+0xa4/0x1c0
> > ? rcu_sync_enter+0x1d6/0x200
> > ? report_bug+0x1bc/0x1f0
> > ? handle_bug+0x46/0x90
> > ? exc_invalid_op+0x18/0x50
> > ? asm_exc_invalid_op+0x1b/0x20
> > ? rcu_sync_enter+0x1d6/0x200
> > ? __pfx_rcu_sync_enter+0x10/0x10
> > ? srso_alias_return_thunk+0x5/0xfbef5
> > ? __kasan_check_write+0x14/0x30
> > percpu_down_write+0x2f/0x2b0
> > ? bch2_replicas_gc2+0x122/0x6f0
> > ? srso_alias_return_thunk+0x5/0xfbef5
> > ? __kmalloc+0x61/0xf0
> > bch2_replicas_gc2+0x155/0x6f0
> > ? __pfx_bch2_replicas_gc2+0x10/0x10
> > ? raw_spin_rq_unlock+0x17/0x70
> > ? finish_task_switch.isra.0+0xc1/0x410
> > ? __switch_to+0x281/0x6d0
> > bch2_data_job+0x298/0x500
> > ? __pfx_bch2_data_job+0x10/0x10
> > ? srso_alias_return_thunk+0x5/0xfbef5
> > ? try_to_wake_up+0x4b1/0xca0
> > ? srso_alias_return_thunk+0x5/0xfbef5
> > ? __pfx_bch2_data_thread+0x10/0x10
> > bch2_data_thread+0x60/0xa0
> > kthread+0x188/0x1d0
> > ? __pfx_kthread+0x10/0x10
> > ret_from_fork+0x47/0x80
> > ? __pfx_kthread+0x10/0x10
> > ret_from_fork_asm+0x1a/0x30
> > </TASK>
> > 
> > WARNING: CPU: 38 PID: 262351 at kernel/rcu/sync.c:193 rcu_sync_dtor+0x96/0xb0

As would this as well.

							Thanx, Paul

> > Modules linked in: brd tls cfg80211 intel_rapl_msr intel_rapl_common
> > amd64_edac edac_mce_amd kvm_amd binfmt_misc kvm irqbypass ipmi_ssif
> > rapl wmi_bmof nls_iso8859_1 ccp ptdma k10temp acpi_ipmi ipmi_si
> > ipmi_devintf ipmi_msghandler input_leds joydev mac_hid efi_pstore
> > dmi_sysfs ip_tables x_tables autofs4 rndis_host cdc_ether usbnet mii
> > btrfs blake2b_generic hid_generic usbhid raid10 hid raid456
> > async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0
> > multipath linear crct10dif_pclmul crc32_pclmul ast polyval_clmulni
> > polyval_generic i2c_algo_bit ghash_clmulni_intel drm_shmem_helper
> > sha512_ssse3 drm_kms_helper sha256_ssse3 sha1_ssse3 nvme ahci drm
> > nvme_core tg3 libahci xhci_pci i2c_piix4 xhci_pci_renesas wmi
> > aesni_intel crypto_simd cryptd [last unloaded: brd]
> > CPU: 38 PID: 262351 Comm: umount Tainted: G W 6.6.0-rc7-next-20231024 #1
> > Hardware name: Supermicro AS -3014TS-i/H12SSL-i, BIOS 2.5 09/08/2022
> > RIP: 0010:rcu_sync_dtor+0x96/0xb0
> > Code: e8 4f 44 3f 00 c7 03 03 00 00 00 4c 89 e7 e8 01 54 91 01 e8 fc
> > 80 00 00 48 89 df e8 54 43 3f 00 8b 03 85 c0 74 c8 0f 0b eb c4 <0f> 0b
> > 48 89 df e8 40 43 3f 00 8b 03 83 f8 02 75 89 0f 0b eb 85 cc
> > RSP: 0018:ffffc9000d35fc60 EFLAGS: 00010286
> > RAX: 00000000ffffffff RBX: ffff8883b7383740 RCX: 0000000000000000
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > RBP: ffffc9000d35fc78 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffff8883b7383770
> > R13: ffff8883b7380040 R14: ffff8883b7380000 R15: 0000000000000000
> > FS: 00007f2033aef800(0000) GS:ffff88bf0f300000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007fea708bae00 CR3: 000000042bf7e005 CR4: 0000000000770ef0
> > PKRU: 55555554
> > Call Trace:
> > <TASK>
> > ? show_regs+0x6c/0x80
> > ? __warn+0xa4/0x1c0
> > ? rcu_sync_dtor+0x96/0xb0
> > ? report_bug+0x1bc/0x1f0
> > ? handle_bug+0x46/0x90
> > ? exc_invalid_op+0x18/0x50
> > ? asm_exc_invalid_op+0x1b/0x20
> > ? rcu_sync_dtor+0x96/0xb0
> > percpu_free_rwsem+0x2a/0x60
> > bch2_fs_release+0x131/0x340
> > kobject_put+0x103/0x250
> > bch2_fs_free+0x144/0x190
> > bch2_kill_sb+0x50/0x70
> > deactivate_locked_super+0x69/0x110
> > deactivate_super+0x64/0x80
> > cleanup_mnt+0x141/0x220
> > __cleanup_mnt+0x12/0x20
> > task_work_run+0x10b/0x190
> > ? __pfx_task_work_run+0x10/0x10
> > ? srso_alias_return_thunk+0x5/0xfbef5
> > ? __x64_sys_umount+0xff/0x110
> > exit_to_user_mode_prepare+0x169/0x190
> > syscall_exit_to_user_mode+0x28/0x60
> > do_syscall_64+0x6b/0xf0
> > ? srso_alias_return_thunk+0x5/0xfbef5
> > ? syscall_exit_to_user_mode+0x36/0x60
> > ? srso_alias_return_thunk+0x5/0xfbef5
> > ? do_syscall_64+0x6b/0xf0
> > ? srso_alias_return_thunk+0x5/0xfbef5
> > ? irqentry_exit_to_user_mode+0x17/0x20
> > ? srso_alias_return_thunk+0x5/0xfbef5
> > ? irqentry_exit+0x43/0x50
> > ? srso_alias_return_thunk+0x5/0xfbef5
> > ? sysvec_call_function+0x48/0xd0
> > entry_SYSCALL_64_after_hwframe+0x6c/0x74
> > RIP: 0033:0x7f203391ee5b
> > Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6
> > e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d
> > 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 89 6f 0d 00 f7 d8
> > RSP: 002b:00007ffcadb990e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
> > RAX: 0000000000000000 RBX: 0000560145c5a5c0 RCX: 00007f203391ee5b
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000560145c5e820
> > RBP: 0000560145c5a6d8 R08: 0000000000000073 R09: 00007ffcadb98031
> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > R13: 0000560145c5e820 R14: 0000560145c5e7f0 R15: 0000560145c5a5c0
> > </TASK>



[Index of Archives]     [Linux Kernel]     [Linux USB Development]     [Yosemite News]     [Linux SCSI]

  Powered by Linux