+CC xfs folks On Sat, May 2, 2020 at 7:10 AM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote: > > ---- 在 星期四, 2020-04-30 20:22:06 Amir Goldstein <amir73il@xxxxxxxxx> 撰写 ---- > > On Thu, Apr 30, 2020 at 12:48 PM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote: > > > > > > ---- 在 星期四, 2020-04-30 17:15:20 Chengguang Xu <cgxu519@xxxxxxxxxxxx> 撰写 ---- > > > > Hi > > > > > > > > I'm doing some tests for my new version of syncfs improvement patch and I found an > > > > interesting problem when combining dirty data && godown && nfs_export. > > > > > > > > My expectation is Pass or Fail all tests listed below, Test2 looks a bit strange and in my > > > > opinion there is no strong connection between nfs_export/index and dirty data. > > > > Any idea? > > > > > > > > > > > > Test env and step like below: > > > > > > > > Test1: > > > > Compile module with nfs_export enabled > > > > Run xfstest generic/474 ==> PASS > > > > > > > > Test2: > > > > Compile module with nfs_export enabled > > > > Comment syncfs step in the test > > > > Run xfstest generic/474 ==> Hang > > > > > > > > Test3: > > > > Compile module with nfs_export disabled > > > > Run xfstest generic/474 ==> PASS > > > > > > > > Test4: > > > > Compile module with nfs_export disabled > > > > Comment syncfs step in the test > > > > Run xfstest generic/474 ==> FAIL > > > > > > > > > > Additional information: > > > > > > Overlayfs version: latest next branch of miklos tree (5.7-rc2) > > > Underlying fs: xfs > > > > > > > Please test also against 5.7-rc2. Maybe we introduced some > > regression in -next. > > > > Please dump waiting processes stack by echo w > /proc/sysrq-trigger > > to see where in kernel does the test hang. > > > > I cannot think of anything in nfs_export/index that should affect > > generic/474, but we will find out soon... > > > > I‘m on vacation this week and it seems hard to reproduce the problem on my laptop, maybe there were some config problems. > I'll do more analyses next week on my testing machine. > Forgot to say - I also tried and failed to reproduce. Looking under the lamppost, I suspect changes in xfs shutdown in v5.7-rc1: git log --oneline --grep shutdown v5.6.. -- fs/xfs 842a42d126b4 xfs: shutdown on failure to add page to log bio 5781464bd1ee xfs: move the ioerror check out of xlog_state_clean_iclog 12e6a0f449d5 xfs: remove the aborted parameter to xlog_state_done_syncing a582f32fade2 xfs: simplify log shutdown checking in xfs_log_release_iclog 8a6271431339 xfs: fix unmount hang and memory leak on shutdown during quotaoff 13859c984301 xfs: cleanup xfs_log_unmount_write b941c71947a0 xfs: mark XLOG_FORCED_SHUTDOWN as unlikely 6b789c337a59 xfs: fix iclog release error check race with shutdown If you are able to reproduce, please try to reproduce with v5.6. It could be an intersection between changes to xfs shutdown and the way that kernel internal modules interact with xfs. Trying to look at wide spread test coverage of -overlay + xfs shutdown, I count only 2 generic tests that exercise this combination: generic/474 and generic/461. The rest of the shutdown tests require either local_device or metadata_journaling. I think that at least Darrick runs -overlay as part of validating an xfs pull request to Linus, so there should be fare amount of test coverage for these two tests. generic/461 seems to do something quite close to what you did when commenting out syncfs in generic/474, but is not in 'quick' group, so it may get less wide testing coverage. I wonder why is is not quick, though. On my system it runs for 24s. Thanks, Amir.