On Tue, Oct 10, 2017 at 06:56:22AM -0400, Brian Foster wrote: > On Tue, Oct 10, 2017 at 04:24:59PM +1100, Dave Chinner wrote: > > On Tue, Oct 10, 2017 at 12:36:49PM +0800, Eryu Guan wrote: > > > On Mon, Oct 09, 2017 at 12:12:55PM -0400, Brian Foster wrote: > > > > On Thu, Aug 31, 2017 at 12:02:37PM +0800, Eryu Guan wrote: > > > > > Run delalloc writes & append writes & non-data-integrity syncs > > > > > concurrently to test the race between block map change vs writeback. > > > > > > > > > > This is to cover an XFS bug that data could be written to wrong > > > > > block and delay allocated blocks are leaked because the block map > > > > > was changed due to the removal of speculative allocated eofblocks > > > > > when writeback is in progress. > > > > > > > > > > And this test partially mimics what lustre-racer[1] test does, using > > > > > which this bug was first found. > > > > > > > > > > [1] https://git.hpdd.intel.com/?p=fs/lustre-release.git;a=tree;f=lustre/tests/racer;hb=HEAD > > > > > > > > > > Signed-off-by: Eryu Guan <eguan@xxxxxxxxxx> > > > > > --- > > > > > > > > > > This may not reproduce the bug on all hosts, but it does reproduce the XFS > > > > > corruption issue reliably on my different test hosts. > > > > > > > > > > > > > Was this problem fixed already or are we still waiting on a fix? > > > > > > It's still an unfixed problem. Dave provided a test patch (which did fix > > > the bug for me) > > > > The test patch I provided broken the COW writeback path, primarily > > because it's a separate mapping path and the change I made doesn't > > work at all well with it.... > > > > > then Christoph suggested a fix based on seqlock, and > > > things stalled there. > > > > I had a look at doing that and got stalled on the fact that, again, > > the COW writeback is completely separate to the existing block > > mapping during writeback path and so applying a seqlock algorithm is > > pretty difficult. > > > > Basically, to fix the problem, we first need to merge the COW and > > delalloc paths in the writepage code and then we'll have a sane base > > on which to apply a proper fix... > > > > (we need to do this to get rid of the bufferhead dependency, anyway) > > > > > (I'm happy to pick up the work, but I'm not that > > > familiar with all the allocation paths that could change the extent map, > > > so I may need some guidance and time to play with it.) > > > > There's some black magic in amongst it all. I'll spend some time on > > it again over the next week and see what I come up with... > > > > Hmm, is this[1] the test patch/thread associated with this test case? If > so, I'm still wondering why we can't just trim the mapping to eof like > the previous code had effectively done for so long..? Eryu, does the > appended diff address this test case? Yes, the appended patch fixed my test failure, it survived 20+ iterations for me. Thanks, Eryu -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html