On Thu, Jul 15, 2021 at 11:31:17AM +0200, Jan Kara wrote: > On Thu 15-07-21 09:42:06, Boyang Xue wrote: > > On Thu, Jul 15, 2021 at 7:46 AM Roman Gushchin <guro@xxxxxx> wrote: > > > > > > On Thu, Jul 15, 2021 at 12:22:28AM +0800, Boyang Xue wrote: > > > > Hi Jan, > > > > > > > > On Wed, Jul 14, 2021 at 5:26 PM Jan Kara <jack@xxxxxxx> wrote: > > > > > > > > > > On Wed 14-07-21 16:44:33, Boyang Xue wrote: > > > > > > Hi Roman, > > > > > > > > > > > > On Wed, Jul 14, 2021 at 12:12 PM Roman Gushchin <guro@xxxxxx> wrote: > > > > > > > > > > > > > > On Wed, Jul 14, 2021 at 11:21:12AM +0800, Boyang Xue wrote: > > > > > > > > Hello, > > > > > > > > > > > > > > > > I'm not sure if this is the right place to report this bug, please > > > > > > > > correct me if I'm wrong. > > > > > > > > > > > > > > > > I found kernel-5.14.0-rc1 (built from the Linus tree) crash when it's > > > > > > > > running xfstests generic/256 on ext4 [1]. Looking at the call trace, > > > > > > > > it looks like the bug had been introduced by the commit > > > > > > > > > > > > > > > > c22d70a162d3 writeback, cgroup: release dying cgwbs by switching attached inodes > > > > > > > > > > > > > > > > It only happens on aarch64, not on x86_64, ppc64le and s390x. Testing > > > > > > > > was performed with the latest xfstests, and the bug can be reproduced > > > > > > > > on ext{2, 3, 4} with {1k, 2k, 4k} block sizes. > > > > > > > > > > > > > > Hello Boyang, > > > > > > > > > > > > > > thank you for the report! > > > > > > > > > > > > > > Do you know on which line the oops happens? > > > > > > > > > > > > I was trying to inspect the vmcore with crash utility, but > > > > > > unfortunately it doesn't work. > > > > > > > > > > Thanks for report! Have you tried addr2line utility? Looking at the oops I > > > > > can see: > > > > > > > > Thanks for the tips! > > > > > > > > It's unclear to me that where to find the required address in the > > > > addr2line command line, i.e. > > > > > > > > addr2line -e /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux > > > > <what address here?> > > > > > > You can use $nm <vmlinux> to get an address of cleanup_offline_cgwbs_workfn() > > > and then add 0x320. > > > > Thanks! Hope the following helps: > > Thanks for the data! > > > static void cleanup_offline_cgwbs_workfn(struct work_struct *work) > > { > > struct bdi_writeback *wb; > > LIST_HEAD(processed); > > > > spin_lock_irq(&cgwb_lock); > > > > while (!list_empty(&offline_cgwbs)) { > > wb = list_first_entry(&offline_cgwbs, struct bdi_writeback, > > offline_node); > > list_move(&wb->offline_node, &processed); > > > > /* > > * If wb is dirty, cleaning up the writeback by switching > > * attached inodes will result in an effective removal of any > > * bandwidth restrictions, which isn't the goal. Instead, > > * it can be postponed until the next time, when all io > > * will be likely completed. If in the meantime some inodes > > * will get re-dirtied, they should be eventually switched to > > * a new cgwb. > > */ > > if (wb_has_dirty_io(wb)) > > continue; > > > > if (!wb_tryget(wb)) <=== line#679 > > continue; > > Aha, interesting. So it seems we crashed trying to dereference > wb->refcnt->data. So it looks like cgwb_release_workfn() raced with > cleanup_offline_cgwbs_workfn() and percpu_ref_exit() got called from > cgwb_release_workfn() and then cleanup_offline_cgwbs_workfn() called > wb_tryget(). I think the proper fix is to move: > > spin_lock_irq(&cgwb_lock); > list_del(&wb->offline_node); > spin_unlock_irq(&cgwb_lock); > > in cgwb_release_workfn() to the beginning of that function so that we are > sure even cleanup_offline_cgwbs_workfn() cannot be working with the wb when > it is being released. Roman? Yes, it sounds like the most reasonable explanation. Thank you! Boyang, would you mind to test the following patch? diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 271f2ca862c8..f5561ea7d90a 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -398,12 +398,12 @@ static void cgwb_release_workfn(struct work_struct *work) blkcg_unpin_online(blkcg); fprop_local_destroy_percpu(&wb->memcg_completions); - percpu_ref_exit(&wb->refcnt); spin_lock_irq(&cgwb_lock); list_del(&wb->offline_node); spin_unlock_irq(&cgwb_lock); + percpu_ref_exit(&wb->refcnt); wb_exit(wb); WARN_ON_ONCE(!list_empty(&wb->b_attached)); kfree_rcu(wb, rcu);