On 7/19/22 22:59, John David Anglin wrote: > Hi Helge, > > I hit this warning with the patch below building ghc on mx3210: As I wrote, I didn't faced it yet on my buildd server, but that could just have been luck... Hillf, should we try if this second hunk triggers? --- a/fs/dcache.c +++ b/fs/dcache.c @@ -616,6 +618,7 @@ static void __dentry_kill(struct dentry dentry->d_flags |= DCACHE_MAY_FREE; can_free = false; } + BUG_ON(!hlist_unhashed(&dentry->d_u.d_alias)); spin_unlock(&dentry->d_lock); if (likely(can_free)) dentry_free(dentry); Helge > mx3210 login: ------------[ cut here ]------------ > WARNING: CPU: 2 PID: 32654 at fs/dcache.c:365 dentry_free+0xfc/0x108 > Modules linked in: binfmt_misc ext2 ext4 crc16 mbcache jbd2 ipmi_watchdog sg ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler fuse nfsd ip_tables x_tables ipv6 autofs4 xfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod t10_pi ses enclosure scsi_transport_sas crc64_rocksoft crc64 uas usb_storage sr_mod cdrom ohci_pci sym53c8xx pata_cmd64x ehci_pci ohci_hcd libata scsi_transport_spi ehci_hcd tg3 scsi_mod usbcore scsi_common usb_common > CPU: 2 PID: 32654 Comm: cc1 Not tainted 5.18.12+ #2 > Hardware name: 9000/800/rp3440 > > YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI > PSW: 00001000000001000110100000001111 Not tainted > r00-03 000000000804680f 00000040ce7fc880 00000000404f2b74 00000040ce7fc920 > r04-07 0000000040be4940 000000410f6cd630 00000001413e4068 000000410f6cd688 > r08-11 0000000040fd2e60 0000000040bc5020 0000000040c2c940 00000000000800e0 > r12-15 0000000040c2c940 0000000000000001 0000000040c2c940 000000410f6cd688 > r16-19 00000001f9fe105d 00000040ce7fc1f8 000000000000002f 000000000a0c1000 > r20-23 000000000800000f 000000000800000f 000000410f6cd639 000000000800000f > r24-27 0000000000000000 0000000000000385 000000410f6cd630 0000000040be4940 > r28-31 0000000041104530 00000040ce7fc8f0 00000040ce7fc9a0 0000000000000000 > sr00-03 0000000000a03800 0000000000000000 0000000000000000 0000000000a03800 > sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000404f18bc 00000000404f18c0 > IIR: 03ffe01f ISR: 0000000010350000 IOR: 00000239ff3fc928 > CPU: 2 CR30: 00000040cadd1380 CR31: ffffffffffffffff > ORIG_R28: 00000040ce7fcb70 > IAOQ[0]: dentry_free+0xfc/0x108 > IAOQ[1]: dentry_free+0x100/0x108 > RP(r2): __dentry_kill+0x2bc/0x338 > Backtrace: > [<00000000404f2b74>] __dentry_kill+0x2bc/0x338 > [<00000000404f37b8>] dentry_kill+0xb0/0x318 > [<00000000404f3d08>] dput+0x2e8/0x328 > [<00000000404dd7dc>] step_into+0x344/0x390 > [<00000000404dda4c>] walk_component+0xa4/0x310 > [<00000000404df234>] link_path_walk.part.0+0x2ec/0x4b0 > [<00000000404e0000>] path_openat+0xe8/0x348 > [<00000000404e2c58>] do_filp_open+0x98/0x178 > [<00000000404babe8>] do_sys_openat2+0x148/0x288 > [<00000000404bb41c>] compat_sys_openat+0x54/0x98 > [<0000000040203e30>] syscall_exit+0x0/0x10 > > ---[ end trace 0000000000000000 ]--- > watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [cc1:32657] > > Regards, > Dave > > On 2022-07-19 12:32 p.m., Helge Deller wrote: >> Hello Hillf, >> >> On 7/17/22 13:36, Hillf Danton wrote: >>> On Sun, 17 Jul 2022 11:42:48 +0200 >>>> I used WARN_ON() instead of BUG_ON(). >>>> With that, both triggered, first the first one, then the second one. >>>> Full log is here: >>>> http://dellerweb.de/testcases/minicom.dcache.crash.6-warn >>> Given the first BUG_ON triggered, and dentry at the moment is supposed to >>> not be alias, see if it is still in lookup with d_lock held. That is the >>> step before de-unioning d_alias with d_in_lookup_hash. >>> >>> On the other hand if only the second one triggered, we should track >>> DCACHE_DENTRY_KILLED instead in assumption that killed dentry was >>> used again after releasing d_lock surrounding the firt one. >> The machine has now been up for 2 days without any issues, while it had pretty >> much the same load as when it was crashing earlier. >> So, in summary I'd assume that your patch below fixes the issue. >> >> I'm now rebooting the machine with a new kernel, where I just changed >> if (unlikely(d_in_lookup(dentry))) >> to >> if (WARN_ON_ONCE(d_in_lookup(dentry))) >> in order to see if this really triggered. >> >> Anyway, I think your patch is good so far. >> Would that be the final patch, or should I test some others? >> >> Thanks! >> Helge >> >>> --- a/fs/dcache.c >>> +++ b/fs/dcache.c >>> @@ -605,8 +605,12 @@ static void __dentry_kill(struct dentry >>> spin_unlock(&parent->d_lock); >>> if (dentry->d_inode) >>> dentry_unlink_inode(dentry); >>> - else >>> + else { >>> + if (unlikely(d_in_lookup(dentry))) { >>> + __d_lookup_done(dentry); >>> + } >>> spin_unlock(&dentry->d_lock); >>> + } >>> this_cpu_dec(nr_dentry); >>> if (dentry->d_op && dentry->d_op->d_release) >>> dentry->d_op->d_release(dentry); > >