Re: [PATCH] xfs: detect agfl count corruption and reset agfl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 15, 2018 at 11:27:02AM -0500, Dave Chiluk wrote:
> On Thu, Mar 15, 2018 at 10:46 AM, Darrick J. Wong
> <darrick.wong@xxxxxxxxxx> wrote:
> > On Thu, Mar 15, 2018 at 06:38:39AM -0400, Brian Foster wrote:
> >> On Wed, Mar 14, 2018 at 03:42:50PM -0500, Dave Chiluk wrote:
> >> > On Wed, Mar 14, 2018 at 1:12 PM, Darrick J. Wong
> >> > <darrick.wong@xxxxxxxxxx> wrote:
> >> > > On Wed, Mar 14, 2018 at 01:17:24PM -0400, Brian Foster wrote:
> >> ...
> >> >
> >> > Reviewed-by Dave Chiluk <chiluk+linuxxfs@xxxxxxxxxx>
> >> >
> >> > I'm also assuming this will get submitted back to the linux-stable
> >> > trees as the agfl packing change is already causing issues in the
> >> > stable trees.  If you do not intend to push it into the linux-stable
> >> > trees let me know and I'll take care of at least the major ones.
> >> >
> >>
> >> Yeah, I can cc stable in the next post along with the other minor fixes.
> >> My question is how far back should this fix go? Was the plan to only go
> >> back to v4.5 because that is where the packing fix first went in? Or
> >> should this go back further because it looks like the packing fix was
> >> backported to v3.10:
> >>
> >> $ git show 96f859d52bcb1
> >> commit 96f859d52bcb1c6ea6f3388d39862bf7143e2f30
> >> Author: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> >> Date:   Mon Jan 4 16:13:21 2016 +1100
> >>
> >>     libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct
> >>
> >>     ...
> >>
> >>     cc: <stable@xxxxxxxxxxxxxxx> # 3.10 - 4.4
> >>     Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> >>     Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
> >>     Signed-off-by: Dave Chinner <david@xxxxxxxxxxxxx>
> >
> > Hmmm, I'm assuming that you'd want 3.10 at least for RHEL, but I'll let
> > you all figure that one out.
> >
> > As far as the upstream kernels, 4.14.27, 4.9.87, 4.4.121, and 4.1.50
> > have that packing patch so I guess they'll all need some version of this.
> >
> > --D
> >
> >>
> >> Brian
> >>
> >> > Thanks,
> >> > Dave
> >> > --
> 
> RHEL is actually fine for now, since they explicitly remove the
> packing patch in their kernel, and xfsprogs.  Once you submit the
> patches to linux-stable the ubuntu-kernel team monitors and includes
> patches for the releases that they are stable maintainers of *(they
> are downstream for 4.4 of gregkh, but currently maintain a 3.13, 4.13,
> and 4.15 tree).
> 
> Also please add a Fixes line to your commit so it's obvious what patch
> it helps remediate.  Fixes is actually not a great word here, but that
> looks to be what the submitting-patches.txt doc calls for.
> 
> Fixes: 96f859d52bcb libxfs: pack the agfl header structure so
> XFS_AGFL_SIZE is correct

No, please don't dumb down a complex issue to a simple, naive
metadata tag like this. Explain the issue fully in the commit
message, mentioning/referencing commits when appropriate.

As I've mentioned in another thread recently about backports - it
you are relying on "fixes" tags to determine what needs backporting,
your backporting process is fundamentally broken.  I don't care what
the kernel documentatin says - it frequently does not apply because
it's written by someone else for their own reasons and requirements
that aren't relevant to us. They are guidelines, not rules, for that
reason.

> This way stable maintainers understand that the fix resolves an issue
> that was introduced by that patch, and can apply/not apply
> appropriately.

I simply don't trust the stable process to get complex XFS backports
right and correctly tested. e.g. We've had this problem before with
things like error numbers changing sign @ 3.16 - patches from >3.16
were getting backported with negative errnos to kernels <3.16, and
they were breaking because errors were not being correctly detected. 

Because nobody in the stable process was regression testing
filesystem backports other than booting kernels, it wasn't until
users installed and started reporting stable kernel regressions to
us that we were able to identify the bugs and the process issues
that caused them.

Put simply: the stable kernel maintainers are not filesystem experts
and they don't run filesystem regression tests to determine that the
fixes don't have any unexpected side effects. What that means is
that stable kernel backports need to be done under the eye of an XFS
developer who then follows up by reviewing the backports once merged
and running regression tests agtainst the resulting kernel as we
cannot rely on the stable process to do this.  It's a serious amount
of work for something as critical as fixing an on-disk format
problem, and we simply can't trust anyone else to do the job
properly.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux