Re: system hang on a syncfs test with nfs_export enabled

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



+CC  xfs folks

On Sat, May 2, 2020 at 7:10 AM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote:
>
>  ---- 在 星期四, 2020-04-30 20:22:06 Amir Goldstein <amir73il@xxxxxxxxx> 撰写 ----
>  > On Thu, Apr 30, 2020 at 12:48 PM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote:
>  > >
>  > >  ---- 在 星期四, 2020-04-30 17:15:20 Chengguang Xu <cgxu519@xxxxxxxxxxxx> 撰写 ----
>  > >  > Hi
>  > >  >
>  > >  > I'm doing some tests for my new version of syncfs improvement patch and I found an
>  > >  > interesting problem when combining dirty data && godown && nfs_export.
>  > >  >
>  > >  > My expectation  is  Pass or Fail  all tests listed below, Test2 looks a bit strange and in my
>  > >  > opinion there is no strong connection between nfs_export/index and dirty data.
>  > >  > Any idea?
>  > >  >
>  > >  >
>  > >  > Test env and step like below:
>  > >  >
>  > >  > Test1:
>  > >  > Compile module with nfs_export enabled
>  > >  > Run xfstest generic/474   ==> PASS
>  > >  >
>  > >  > Test2:
>  > >  > Compile module with nfs_export enabled
>  > >  > Comment syncfs step in the test
>  > >  > Run xfstest generic/474   ==> Hang
>  > >  >
>  > >  > Test3:
>  > >  > Compile module with nfs_export disabled
>  > >  > Run xfstest generic/474   ==> PASS
>  > >  >
>  > >  > Test4:
>  > >  > Compile module with nfs_export disabled
>  > >  > Comment syncfs step in the test
>  > >  > Run xfstest generic/474   ==> FAIL
>  > >  >
>  > >
>  > > Additional information:
>  > >
>  > > Overlayfs version: latest next branch of miklos tree (5.7-rc2)
>  > > Underlying fs: xfs
>  > >
>  >
>  > Please test also against 5.7-rc2. Maybe we introduced some
>  > regression in -next.
>  >
>  > Please dump waiting processes stack by echo w > /proc/sysrq-trigger
>  > to see where in kernel does the test hang.
>  >
>  > I cannot think of anything in nfs_export/index that should affect
>  > generic/474, but we will find out soon...
>  >
>
> I‘m on vacation this week and it seems hard to reproduce the problem on my laptop, maybe there were some config problems.
> I'll do more analyses next week on my testing machine.
>

Forgot to say - I also tried and failed to reproduce.

Looking under the lamppost, I suspect changes in xfs shutdown
in v5.7-rc1:

git log --oneline --grep shutdown v5.6.. -- fs/xfs
842a42d126b4 xfs: shutdown on failure to add page to log bio
5781464bd1ee xfs: move the ioerror check out of xlog_state_clean_iclog
12e6a0f449d5 xfs: remove the aborted parameter to xlog_state_done_syncing
a582f32fade2 xfs: simplify log shutdown checking in xfs_log_release_iclog
8a6271431339 xfs: fix unmount hang and memory leak on shutdown during quotaoff
13859c984301 xfs: cleanup xfs_log_unmount_write
b941c71947a0 xfs: mark XLOG_FORCED_SHUTDOWN as unlikely
6b789c337a59 xfs: fix iclog release error check race with shutdown

If you are able to reproduce, please try to reproduce with v5.6.
It could be an intersection between changes to xfs shutdown and
the way that kernel internal modules interact with xfs.

Trying to look at wide spread test coverage of -overlay + xfs shutdown,
I count only 2 generic tests that exercise this combination:
generic/474 and generic/461. The rest of the shutdown tests require
either local_device or metadata_journaling.

I think that at least Darrick runs -overlay as part of validating
an xfs pull request to Linus, so there should be fare amount of test
coverage for these two tests.

generic/461 seems to do something quite close to what you did when
commenting out syncfs in generic/474, but is not in 'quick' group, so it
may get less wide testing coverage.
I wonder why is is not quick, though. On my system it runs for 24s.

Thanks,
Amir.




[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux