Thank you very much for testing it! Sent from my iPhone > On Jul 21, 2021, at 22:29, Boyang Xue <bxue@xxxxxxxxxx> wrote: > > Just FYI, the tests on ppc64le are done, no longer kernel panic, so my > tests on all arches are fine now. > >> On Sat, Jul 17, 2021 at 8:00 PM Boyang Xue <bxue@xxxxxxxxxx> wrote: >> >> Testing fstests on aarch64, x86_64, s390x all passed. There's a >> shortage of ppc64le systems, so I can't provide the ppc64le test >> result for now, but I hope I can report the result next week. >> >> Thanks, >> Boyang >> >>> On Sat, Jul 17, 2021 at 4:04 AM Roman Gushchin <guro@xxxxxx> wrote: >>> >>> On Fri, Jul 16, 2021 at 09:23:40AM -0700, Darrick J. Wong wrote: >>>> On Thu, Jul 15, 2021 at 03:28:12PM -0700, Darrick J. Wong wrote: >>>>> On Thu, Jul 15, 2021 at 01:08:15PM -0700, Roman Gushchin wrote: >>>>>> On Thu, Jul 15, 2021 at 10:10:50AM -0700, Darrick J. Wong wrote: >>>>>>> On Thu, Jul 15, 2021 at 11:51:50AM +0800, Boyang Xue wrote: >>>>>>>> On Thu, Jul 15, 2021 at 10:36 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: >>>>>>>>> >>>>>>>>> On Thu, Jul 15, 2021 at 12:22:28AM +0800, Boyang Xue wrote: >>>>>>>>>> It's unclear to me that where to find the required address in the >>>>>>>>>> addr2line command line, i.e. >>>>>>>>>> >>>>>>>>>> addr2line -e /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux >>>>>>>>>> <what address here?> >>>>>>>>> >>>>>>>>> ./scripts/faddr2line /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux cleanup_offline_cgwbs_workfn+0x320/0x394 >>>>>>>>> >>>>>>>> >>>>>>>> Thanks! The result is the same as the >>>>>>>> >>>>>>>> addr2line -i -e >>>>>>>> /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux >>>>>>>> FFFF8000102D6DD0 >>>>>>>> >>>>>>>> But this script is very handy. >>>>>>>> >>>>>>>> # /usr/src/kernels/5.14.0-0.rc1.15.bx.el9.aarch64/scripts/faddr2line >>>>>>>> /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux >>>>>>>> cleanup_offlin >>>>>>>> e_cgwbs_workfn+0x320/0x394 >>>>>>>> cleanup_offline_cgwbs_workfn+0x320/0x394: >>>>>>>> arch_atomic64_fetch_add_unless at >>>>>>>> /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/atomic-arch-fallback.h:2265 >>>>>>>> (inlined by) arch_atomic64_add_unless at >>>>>>>> /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/atomic-arch-fallback.h:2290 >>>>>>>> (inlined by) atomic64_add_unless at >>>>>>>> /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/asm-generic/atomic-instrumented.h:1149 >>>>>>>> (inlined by) atomic_long_add_unless at >>>>>>>> /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/asm-generic/atomic-long.h:491 >>>>>>>> (inlined by) percpu_ref_tryget_many at >>>>>>>> /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/percpu-refcount.h:247 >>>>>>>> (inlined by) percpu_ref_tryget at >>>>>>>> /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/percpu-refcount.h:266 >>>>>>>> (inlined by) wb_tryget at >>>>>>>> /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/backing-dev-defs.h:227 >>>>>>>> (inlined by) wb_tryget at >>>>>>>> /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/backing-dev-defs.h:224 >>>>>>>> (inlined by) cleanup_offline_cgwbs_workfn at >>>>>>>> /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/mm/backing-dev.c:679 >>>>>>>> >>>>>>>> # vi /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/mm/backing-dev.c >>>>>>>> ``` >>>>>>>> static void cleanup_offline_cgwbs_workfn(struct work_struct *work) >>>>>>>> { >>>>>>>> struct bdi_writeback *wb; >>>>>>>> LIST_HEAD(processed); >>>>>>>> >>>>>>>> spin_lock_irq(&cgwb_lock); >>>>>>>> >>>>>>>> while (!list_empty(&offline_cgwbs)) { >>>>>>>> wb = list_first_entry(&offline_cgwbs, struct bdi_writeback, >>>>>>>> offline_node); >>>>>>>> list_move(&wb->offline_node, &processed); >>>>>>>> >>>>>>>> /* >>>>>>>> * If wb is dirty, cleaning up the writeback by switching >>>>>>>> * attached inodes will result in an effective removal of any >>>>>>>> * bandwidth restrictions, which isn't the goal. Instead, >>>>>>>> * it can be postponed until the next time, when all io >>>>>>>> * will be likely completed. If in the meantime some inodes >>>>>>>> * will get re-dirtied, they should be eventually switched to >>>>>>>> * a new cgwb. >>>>>>>> */ >>>>>>>> if (wb_has_dirty_io(wb)) >>>>>>>> continue; >>>>>>>> >>>>>>>> if (!wb_tryget(wb)) <=== line#679 >>>>>>>> continue; >>>>>>>> >>>>>>>> spin_unlock_irq(&cgwb_lock); >>>>>>>> while (cleanup_offline_cgwb(wb)) >>>>>>>> cond_resched(); >>>>>>>> spin_lock_irq(&cgwb_lock); >>>>>>>> >>>>>>>> wb_put(wb); >>>>>>>> } >>>>>>>> >>>>>>>> if (!list_empty(&processed)) >>>>>>>> list_splice_tail(&processed, &offline_cgwbs); >>>>>>>> >>>>>>>> spin_unlock_irq(&cgwb_lock); >>>>>>>> } >>>>>>>> ``` >>>>>>>> >>>>>>>> BTW, this bug can be only reproduced on a non-debug production built >>>>>>>> kernel (a.k.a kernel rpm package), it's not reproducible on a debug >>>>>>>> build with various debug configuration enabled (a.k.a kernel-debug rpm >>>>>>>> package) >>>>>>> >>>>>>> FWIW I've also seen this regularly on x86_64 kernels on ext4 with all >>>>>>> default mkfs settings when running generic/256. >>>>>> >>>>>> Oh, that's a useful information, thank you! >>>>>> >>>>>> Btw, would you mind to give a patch from an earlier message in the thread >>>>>> a test? I'd highly appreciate it. >>>>>> >>>>>> Thanks! >>>>> >>>>> Will do. >>>> >>>> fstests passed here, so >>>> >>>> Tested-by: Darrick J. Wong <djwong@xxxxxxxxxx> >>> >>> Great, thank you! >>> >