On Tue, Sep 19, 2017 at 08:31:37AM +0300, Amir Goldstein wrote: > On Tue, Sep 19, 2017 at 12:24 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Mon, Sep 18, 2017 at 09:00:30PM +0300, Amir Goldstein wrote: > >> On Mon, Sep 18, 2017 at 8:11 PM, Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote: > >> > On Fri, Sep 15, 2017 at 03:40:24PM +0300, Amir Goldstein wrote: > >> >> The disclosure of the security bug fix (commit b31ff3cdf5) made me wonder > >> >> if possible data loss bug should also be disclosed in some distros forum? > >> >> I bet some users would care more about the latter than the former. > >> >> Coincidentally, both data loss and security bugs fix the same commit.. > >> > > >> > Yes the the patch ought to get sent on to stable w/ fixes tag. One > >> > would hope that the distros will pick up the stable fixes from there. > > > > Yup, that's the normal process for data integrity/fs corruption > > bugs. > > Makes sense. I'm convinced that the normal process is sufficient for this > sort of bug fix. > > > > >> > That said, it's been in the kernel for 12 years without widespread > >> > complaints about corruption, so I'm not sure this warrants public > >> > disclosure via CVE/Phoronix vs. just fixing it. > >> > > >> > >> I'm not sure either. > >> My intuition tells me that the chances of hitting the data loss bug > >> given a power failure are not slim, but the chances of users knowing > >> about the data loss are slim. > > > > The chances of hitting it are slim. Power-fail vs fsync data > > integrity testing is something we do actually run as part of QE and > > have for many years. We've been running such testing for years and > > never tripped over this problem, so I think the likelihood that a > > user will hit it is extremely small. > > This sentence make me unease. > Who is We and what QE testing are you referring to? > Are those tests in xfstests or any other public repository? > My first reaction to the corruption was "no way, I need to check the test" > Second reaction after checking the test was "this must very very hard to hit" /me prefers to think that we've simply gotten lucky all these years and nobody actually managed to die before another flush would take care of the dirty data. But then I did just spend a week in Las Vegas. :P > But from closer inspection, it looks like it doesn't take more than running > a couple of fsync in parallel to get to the "at risk" state, which may persist > for seconds. > Of course the chances of users being that unlucky to also get a power > failure during "at risk" state are low, but I am puzzled how power fail tests > you claim that exists, didn't catch this sooner. > > Anyway, not sure there is much more to discuss, just wanted to see > if there is a post mortem lesson to be learned from this, beyond the fact that > dm-log-writes is a valuable testing tool. Agreed. :) --D > > Cheers, > Amir. > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html