Re: [PATCH] xfs: fix incorrect log_flushed on fsync

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 20 Sep 2017 10:40:12 +1000

On Tue, Sep 19, 2017 at 08:31:37AM +0300, Amir Goldstein wrote:
> On Tue, Sep 19, 2017 at 12:24 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > On Mon, Sep 18, 2017 at 09:00:30PM +0300, Amir Goldstein wrote:
> >> On Mon, Sep 18, 2017 at 8:11 PM, Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote:
> >> > On Fri, Sep 15, 2017 at 03:40:24PM +0300, Amir Goldstein wrote:
> >> > That said, it's been in the kernel for 12 years without widespread
> >> > complaints about corruption, so I'm not sure this warrants public
> >> > disclosure via CVE/Phoronix vs. just fixing it.
> >> >
> >>
> >> I'm not sure either.
> >> My intuition tells me that the chances of hitting the data loss bug
> >> given a power failure are not slim, but the chances of users knowing
> >> about the data loss are slim.
> >
> > The chances of hitting it are slim. Power-fail vs fsync data
> > integrity testing is something we do actually run as part of QE and
> > have for many years.  We've been running such testing for years and
> > never tripped over this problem, so I think the likelihood that a
> > user will hit it is extremely small.
> 
> This sentence make me unease.
> Who is We and what QE testing are you referring to?

I've done it in the past myself with a modified crash/xfscrash to
write patterned files (via genstream/checkstream). Unfortunately, I
lost that script when the machine used for that testing suffered a
fatal, completely unrecoverable ext3 root filesystem corruption
during a power fail cycle... :/

RH QE also runs automated power fail cycle tests - we found lots of
ext4 problems with that test rig when it was first put together, but
I don't recall seeing XFS issues reported.  Eryu would have to
confirm, but ISTR that this testing was made part of the regular
RHEL major release testing cycle...

Let's not forget all the other storage vendors and apps out there
that do their own crash/power fail testing that rely on a working
fsync. Apps like ceph, cluster, various databases, etc all have
their own data integrity testing procedures, and so if there's any
obvious or easy to hit fsync bug we would have had people reporting
it long ago.

Then there's all the research tools that have had papers written
about them testing exactly the sort of thing that dm-log writes is
testing. None of these indicated any sort of problem with fsync in
XFS, but we couldn't reproduce or verify the research results of the
because none of those fine institutions ever open sourced their
tools despite repeated requests and promises that it would happen.

> Are those tests in xfstests or any other public repository?

crash/xfscrash is, and now dm-log-write, but nothing else is.

> My first reaction to the corruption was "no way, I need to check the test"
> Second reaction after checking the test was "this must very very hard to hit"
> But from closer inspection, it looks like it doesn't take more than running
> a couple of fsync in parallel to get to the "at risk" state, which may persist
> for seconds.

That may be the case, but the reality is we don't have a body of
evidence to suggest this is a problem anyone is actually hitting. In
fact, we don't have any evidence it's been seen in the wild at all.

> Of course the chances of users being that unlucky to also get a power
> failure during "at risk" state are low, but I am puzzled how power fail tests
> you claim that exists, didn't catch this sooner.

Probably for the same reason app developers and users aren't
reporting fsync data loss problems.  While the bug may "look obvious
in hindsight", the fact is that there are no evidence of data loss
after fsync on XFS in the real world.  Occam's Razor suggests that
there is something that masks the problem that we don't understand
yet....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx