Re: [PATCH 5/5] fstests: add stress truncation + writeback test

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Tue, 11 Jun 2024 11:59:09 -0700

On Tue, Jun 11, 2024 at 11:29:59AM -0700, Darrick J. Wong wrote:
> On Tue, Jun 11, 2024 at 11:15:52AM -0700, Luis Chamberlain wrote:
> > On Tue, Jun 11, 2024 at 07:45:03AM -0700, Darrick J. Wong wrote:
> > > On Mon, Jun 10, 2024 at 08:02:02PM -0700, Luis Chamberlain wrote:
> > > > +# Requires CONFIG_DEBUGFS and truncation knobs
> > > > +_require_split_debugfs()
> > > 
> > > Er... I thought "split" referred to debugfs itself.
> > > 
> > > _require_split_huge_pages_knob?
> > 
> > Much better, thanks.
> > 
> > > > +# This aims at trying to reproduce a difficult to reproduce bug found with
> > > > +# min order. The issue was root caused to an xarray bug when we split folios
> > > > +# to another order other than 0. This functionality is used to support min
> > > > +# order. The crash:
> > > > +#
> > > > +# https://gist.github.com/mcgrof/d12f586ec6ebe32b2472b5d634c397df
> > > 
> > > You might want to paste the stacktrace in here directly, in case the
> > > gist ever goes away.
> > 
> > Its not a simple crash trace, it is pretty enourmous considering I
> > decoded it, and it has all locking candidates. Even including it after
> > the "---" lines of the patch might make someone go: TLDR. Thoughts?
> 
> I'd paste it in, even if it's quite lengthy.  I don't even think it's all that
> much if you remove some of the less useful bits of the unwind:
> 
> "Crash excerpt is as follows:
> 
> "BUG: kernel NULL pointer dereference, address: 0000000000000036
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 7 PID: 2190 Comm: kworker/u38:5 Not tainted 6.9.0-rc5+ #14
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> Workqueue: writeback wb_workfn (flush-7:5)
> RIP: 0010:filemap_get_folios_tag+0xa9/0x200
> Call Trace:
>  <TASK>
>  writeback_iter+0x17d/0x310
>  write_cache_pages+0x42/0xa0
>  iomap_writepages+0x33/0x50
>  xfs_vm_writepages+0x63/0x90 [xfs]
>  do_writepages+0xcc/0x260
>  __writeback_single_inode+0x3d/0x340
>  writeback_sb_inodes+0x1ed/0x4b0
>  __writeback_inodes_wb+0x4c/0xe0
>  wb_writeback+0x267/0x2d0
>  wb_workfn+0x2a4/0x440
>  process_one_work+0x189/0x3b0
>  worker_thread+0x273/0x390
>  kthread+0xda/0x110
>  ret_from_fork+0x2d/0x50
>  ret_from_fork_asm+0x1a/0x30
>  </TASK>"

Ah, sorry yes, this crash dump is small, the other one is the one that
was I thinking, which we still deadlock on and have only a lockdep hint
about likely what is going on. I'll include this dump on v2.

  Luis