On Mon, Dec 04, 2017 at 10:17:30PM +0200, Amir Goldstein wrote: > On Thu, Nov 30, 2017 at 10:28 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > On Wed, Nov 29, 2017 at 5:33 AM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > [...] > > So far I was able to determine that your patch > > "xfs: log recovery should replay deferred ops in order" is NOT the > > cause of the problem. > > This took some time, because at one point it took me 23 hr to get to > > the dirty log > > in test partition with modified 455 (no dm-log-writes). > > > > Attached metadump of corrupt test partition. > > The xfs code this test was running with is v4.14-rc8. > > I did not try to bisect any further because of the time it takes per commit. > > > > Let me know if you need any other info or if you want me to run the test > > on my setup for specific patch and/or bisection points. > > > > I figured out what was going on in my test setup. > The answer was in the attached dmesg, but I overlooked it: > > [33816.533286] ata3.00: failed command: FLUSH CACHE EXT > [33816.533294] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 21 > res 40/00:00:20:44:ba/00:00:0c:00:00/40 Emask > 0x10 (ATA bus error) > [33816.533300] ata3.00: status: { DRDY } > [33816.533309] ata3: hard resetting link > > It appears that that test machine had a faulty SATA cable. > > This is probably more cruel to fs than a dm-flakey/dm-log-writes test... Not as bad as the time when I discovered that one of my UASP bridges was arbitrarily injecting 'USBUSBUSB' into bus transfers. > Cable replaced. Back to sanity. Sorry for the noise. :) --D > Amir.