Re: [PATCH v3 10/13] fstests: crash consistency fsx test using dm-log-writes

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Mon, 4 Dec 2017 12:53:12 -0800

On Mon, Dec 04, 2017 at 10:17:30PM +0200, Amir Goldstein wrote:
> On Thu, Nov 30, 2017 at 10:28 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > On Wed, Nov 29, 2017 at 5:33 AM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> [...]
> > So far I was able to determine that your patch
> > "xfs: log recovery should replay deferred ops in order" is NOT the
> > cause of the problem.
> > This took some time, because at one point it took me 23 hr to get to
> > the dirty log
> > in test partition with modified 455 (no dm-log-writes).
> >
> > Attached metadump of corrupt test partition.
> > The xfs code this test was running with is v4.14-rc8.
> > I did not try to bisect any further because of the time it takes per commit.
> >
> > Let me know if you need any other info or if you want me to run the test
> > on my setup for specific patch and/or bisection points.
> >
> 
> I figured out what was going on in my test setup.
> The answer was in the attached dmesg, but I overlooked it:
> 
> [33816.533286] ata3.00: failed command: FLUSH CACHE EXT
> [33816.533294] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 21
>                         res 40/00:00:20:44:ba/00:00:0c:00:00/40 Emask
> 0x10 (ATA bus error)
> [33816.533300] ata3.00: status: { DRDY }
> [33816.533309] ata3: hard resetting link
> 
> It appears that that test machine had a faulty SATA cable.
> 
> This is probably more cruel to fs than a dm-flakey/dm-log-writes test...

Not as bad as the time when I discovered that one of my UASP bridges was
arbitrarily injecting 'USBUSBUSB' into bus transfers.

> Cable replaced. Back to sanity. Sorry for the noise.

:)

--D

> Amir.