On Thu, Jul 15, 2021 at 10:10:50AM -0700, Darrick J. Wong wrote: > On Thu, Jul 15, 2021 at 11:51:50AM +0800, Boyang Xue wrote: > > On Thu, Jul 15, 2021 at 10:36 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > > > > On Thu, Jul 15, 2021 at 12:22:28AM +0800, Boyang Xue wrote: > > > > It's unclear to me that where to find the required address in the > > > > addr2line command line, i.e. > > > > > > > > addr2line -e /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux > > > > <what address here?> > > > > > > ./scripts/faddr2line /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux cleanup_offline_cgwbs_workfn+0x320/0x394 > > > > > > > Thanks! The result is the same as the > > > > addr2line -i -e > > /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux > > FFFF8000102D6DD0 > > > > But this script is very handy. > > > > # /usr/src/kernels/5.14.0-0.rc1.15.bx.el9.aarch64/scripts/faddr2line > > /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux > > cleanup_offlin > > e_cgwbs_workfn+0x320/0x394 > > cleanup_offline_cgwbs_workfn+0x320/0x394: > > arch_atomic64_fetch_add_unless at > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/atomic-arch-fallback.h:2265 > > (inlined by) arch_atomic64_add_unless at > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/atomic-arch-fallback.h:2290 > > (inlined by) atomic64_add_unless at > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/asm-generic/atomic-instrumented.h:1149 > > (inlined by) atomic_long_add_unless at > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/asm-generic/atomic-long.h:491 > > (inlined by) percpu_ref_tryget_many at > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/percpu-refcount.h:247 > > (inlined by) percpu_ref_tryget at > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/percpu-refcount.h:266 > > (inlined by) wb_tryget at > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/backing-dev-defs.h:227 > > (inlined by) wb_tryget at > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/backing-dev-defs.h:224 > > (inlined by) cleanup_offline_cgwbs_workfn at > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/mm/backing-dev.c:679 > > > > # vi /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/mm/backing-dev.c > > ``` > > static void cleanup_offline_cgwbs_workfn(struct work_struct *work) > > { > > struct bdi_writeback *wb; > > LIST_HEAD(processed); > > > > spin_lock_irq(&cgwb_lock); > > > > while (!list_empty(&offline_cgwbs)) { > > wb = list_first_entry(&offline_cgwbs, struct bdi_writeback, > > offline_node); > > list_move(&wb->offline_node, &processed); > > > > /* > > * If wb is dirty, cleaning up the writeback by switching > > * attached inodes will result in an effective removal of any > > * bandwidth restrictions, which isn't the goal. Instead, > > * it can be postponed until the next time, when all io > > * will be likely completed. If in the meantime some inodes > > * will get re-dirtied, they should be eventually switched to > > * a new cgwb. > > */ > > if (wb_has_dirty_io(wb)) > > continue; > > > > if (!wb_tryget(wb)) <=== line#679 > > continue; > > > > spin_unlock_irq(&cgwb_lock); > > while (cleanup_offline_cgwb(wb)) > > cond_resched(); > > spin_lock_irq(&cgwb_lock); > > > > wb_put(wb); > > } > > > > if (!list_empty(&processed)) > > list_splice_tail(&processed, &offline_cgwbs); > > > > spin_unlock_irq(&cgwb_lock); > > } > > ``` > > > > BTW, this bug can be only reproduced on a non-debug production built > > kernel (a.k.a kernel rpm package), it's not reproducible on a debug > > build with various debug configuration enabled (a.k.a kernel-debug rpm > > package) > > FWIW I've also seen this regularly on x86_64 kernels on ext4 with all > default mkfs settings when running generic/256. Oh, that's a useful information, thank you! Btw, would you mind to give a patch from an earlier message in the thread a test? I'd highly appreciate it. Thanks!