Re: Patch 'writeback, cgroup: release dying cgwbs by switching attached inodes' leads to kernel crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 15, 2021 at 10:10:50AM -0700, Darrick J. Wong wrote:
> On Thu, Jul 15, 2021 at 11:51:50AM +0800, Boyang Xue wrote:
> > On Thu, Jul 15, 2021 at 10:36 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > >
> > > On Thu, Jul 15, 2021 at 12:22:28AM +0800, Boyang Xue wrote:
> > > > It's unclear to me that where to find the required address in the
> > > > addr2line command line, i.e.
> > > >
> > > > addr2line -e /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux
> > > > <what address here?>
> > >
> > > ./scripts/faddr2line /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux cleanup_offline_cgwbs_workfn+0x320/0x394
> > >
> > 
> > Thanks! The result is the same as the
> > 
> > addr2line -i -e
> > /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux
> > FFFF8000102D6DD0
> > 
> > But this script is very handy.
> > 
> > # /usr/src/kernels/5.14.0-0.rc1.15.bx.el9.aarch64/scripts/faddr2line
> > /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux
> > cleanup_offlin
> > e_cgwbs_workfn+0x320/0x394
> > cleanup_offline_cgwbs_workfn+0x320/0x394:
> > arch_atomic64_fetch_add_unless at
> > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/atomic-arch-fallback.h:2265
> > (inlined by) arch_atomic64_add_unless at
> > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/atomic-arch-fallback.h:2290
> > (inlined by) atomic64_add_unless at
> > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/asm-generic/atomic-instrumented.h:1149
> > (inlined by) atomic_long_add_unless at
> > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/asm-generic/atomic-long.h:491
> > (inlined by) percpu_ref_tryget_many at
> > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/percpu-refcount.h:247
> > (inlined by) percpu_ref_tryget at
> > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/percpu-refcount.h:266
> > (inlined by) wb_tryget at
> > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/backing-dev-defs.h:227
> > (inlined by) wb_tryget at
> > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/backing-dev-defs.h:224
> > (inlined by) cleanup_offline_cgwbs_workfn at
> > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/mm/backing-dev.c:679
> > 
> > # vi /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/mm/backing-dev.c
> > ```
> > static void cleanup_offline_cgwbs_workfn(struct work_struct *work)
> > {
> >         struct bdi_writeback *wb;
> >         LIST_HEAD(processed);
> > 
> >         spin_lock_irq(&cgwb_lock);
> > 
> >         while (!list_empty(&offline_cgwbs)) {
> >                 wb = list_first_entry(&offline_cgwbs, struct bdi_writeback,
> >                                       offline_node);
> >                 list_move(&wb->offline_node, &processed);
> > 
> >                 /*
> >                  * If wb is dirty, cleaning up the writeback by switching
> >                  * attached inodes will result in an effective removal of any
> >                  * bandwidth restrictions, which isn't the goal.  Instead,
> >                  * it can be postponed until the next time, when all io
> >                  * will be likely completed.  If in the meantime some inodes
> >                  * will get re-dirtied, they should be eventually switched to
> >                  * a new cgwb.
> >                  */
> >                 if (wb_has_dirty_io(wb))
> >                         continue;
> > 
> >                 if (!wb_tryget(wb))  <=== line#679
> >                         continue;
> > 
> >                 spin_unlock_irq(&cgwb_lock);
> >                 while (cleanup_offline_cgwb(wb))
> >                         cond_resched();
> >                 spin_lock_irq(&cgwb_lock);
> > 
> >                 wb_put(wb);
> >         }
> > 
> >         if (!list_empty(&processed))
> >                 list_splice_tail(&processed, &offline_cgwbs);
> > 
> >         spin_unlock_irq(&cgwb_lock);
> > }
> > ```
> > 
> > BTW, this bug can be only reproduced on a non-debug production built
> > kernel (a.k.a kernel rpm package), it's not reproducible on a debug
> > build with various debug configuration enabled (a.k.a kernel-debug rpm
> > package)
> 
> FWIW I've also seen this regularly on x86_64 kernels on ext4 with all
> default mkfs settings when running generic/256.

Oh, that's a useful information, thank you!

Btw, would you mind to give a patch from an earlier message in the thread
a test? I'd highly appreciate it.

Thanks!



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux