On Thu, Oct 26, 2023 at 07:35:56PM -0700, Paul E. McKenney wrote: > On Wed, Oct 25, 2023 at 02:08:45PM -0400, Kent Overstreet wrote: > > Hey Paul, RCU folks :) > > > > I've got no idea what these warnings mean and haven't seen them before, > > do you suppose one of you could point me in the right direction? > > > > On Wed, Oct 25, 2023 at 09:28:09PM +0800, Daniel J Blueman wrote: > > > Hi Kent et al, > > > > > > On 6.6.0-rc7-next-20231024 with my bcachefs exerciser [1], I was able > > > to reproduce three related kernel warnings due to RCU grace period > > > counts being zero, ie WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0). > > > > > > If this is something of interest, I'll find a minimal reproducer. > > > These warnings aside, bcachefs is looking really solid. > > > > > > Thanks, > > > Daniel > > > > > > -- [1] https://github.com/dblueman/bcachefs-gym > > > > > > -- [2] > > > > > > WARNING: CPU: 15 PID: 259240 at kernel/rcu/sync.c:171 rcu_sync_exit+0xe3/0xf0 > > The usual cause would be mismatched rcu_sync_enter() and rcu_sync_exit(), > as in one more rcu_sync_exit() than rcu_sync_enter()... > > > > Modules linked in: brd tls cfg80211 intel_rapl_msr intel_rapl_common > > > amd64_edac edac_mce_amd kvm_amd binfmt_misc kvm irqbypass ipmi_ssif > > > rapl wmi_bmof nls_iso8859_1 ccp ptdma k10temp acpi_ipmi ipmi_si > > > ipmi_devintf ipmi_msghandler input_leds joydev mac_hid efi_pstore > > > dmi_sysfs ip_tables x_tables autofs4 rndis_host cdc_ether usbnet m > > > ii btrfs blake2b_generic hid_generic usbhid raid10 hid raid456 > > > async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 > > > multipath linear crct10dif_pclmul crc32_pclmul ast polyval_clmulni > > > polyval_generic i2c_algo_bit ghash_clmulni_intel drm_shmem_helper > > > sha512_ssse3 drm_kms_helper sha256_ssse3 sha1_ssse3 nvme ahci drm > > > nvme_core tg3 l > > > ibahci xhci_pci i2c_piix4 xhci_pci_renesas wmi aesni_intel crypto_simd > > > cryptd [last unloaded: brd] > > > CPU: 15 PID: 259240 Comm: bch-data/3edb8b Tainted: G W > > > 6.6.0-rc7-next-20231024 #1 > > > Hardware name: Supermicro AS -3014TS-i/H12SSL-i, BIOS 2.5 09/08/2022 > > > RIP: 0010:rcu_sync_exit+0xe3/0xf0 > > > Code: c6 e0 06 c7 b2 e8 dd 0e 01 00 4c 89 e7 e8 b5 54 91 01 5b 41 5c > > > 41 5d 5d 31 c0 31 f6 31 ff e9 8f 35 a9 01 0f 0b e9 3d ff ff ff <0f> 0b > > > e9 4d ff ff ff 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 > > > RSP: 0018:ffffc90024127b60 EFLAGS: 00010246 > > > RAX: 0000000000000000 RBX: ffff8883b7383740 RCX: 0000000000000000 > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > > RBP: ffffc90024127b78 R08: 0000000000000000 R09: 0000000000000000 > > > R10: 0000000000000000 R11: 0000000000000000 R12: ffff8883b7383798 > > > R13: ffff8883b7383744 R14: ffff8883b7383740 R15: ffff8883b7380498 > > > FS: 0000000000000000(0000) GS:ffff88bf0e780000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 00007fea708bae00 CR3: 000000034b860003 CR4: 0000000000770ef0 > > > PKRU: 55555554 > > > Call Trace: > > > <TASK> > > > ? show_regs+0x6c/0x80 > > > ? __warn+0xa4/0x1c0 > > > ? rcu_sync_exit+0xe3/0xf0 > > > ? report_bug+0x1bc/0x1f0 > > > ? handle_bug+0x46/0x90 > > > ? exc_invalid_op+0x18/0x50 > > > ? asm_exc_invalid_op+0x1b/0x20 > > > ? rcu_sync_exit+0xe3/0xf0 > > > percpu_up_write+0x4d/0x60 > > ...correspond to one more percpu_up_write() than percpu_down_write(). > > If it is possible to reproduce this easily, one way to get more > information would be to run with lockdep. Thanks, that was exactly it :) Daniel, the fix is in my -testing branch (and will shortly be in master), can you confirm?