Re: [PATCH V2] uapi: fix statx attribute value overlap for DAX & MOUNT_ROOT

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 3 Dec 2020 12:50:03 +1100

On Wed, Dec 02, 2020 at 10:04:17PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Dec 03, 2020 at 07:40:45AM +1100, Dave Chinner wrote:
> > On Wed, Dec 02, 2020 at 08:06:01PM +0100, Greg Kroah-Hartman wrote:
> > > On Wed, Dec 02, 2020 at 06:41:43PM +0100, Miklos Szeredi wrote:
> > > > On Wed, Dec 2, 2020 at 5:24 PM David Howells <dhowells@xxxxxxxxxx> wrote:
> > > > >
> > > > > Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> > > > >
> > > > > > Stable cc also?
> > > > > >
> > > > > > Cc: <stable@xxxxxxxxxxxxxxx> # 5.8
> > > > >
> > > > > That seems to be unnecessary, provided there's a Fixes: tag.
> > > > 
> > > > Is it?
> > > > 
> > > > Fixes: means it fixes a patch, Cc: stable means it needs to be
> > > > included in stable kernels.  The two are not necessarily the same.
> > > > 
> > > > Greg?
> > > 
> > > You are correct.  cc: stable, as is documented in
> > >     https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> > > ensures that the patch will get merged into the stable tree.
> > > 
> > > Fixes: is independent of it.  It's great to have for stable patches so
> > > that I know how far back to backport patches.
> > > 
> > > We do scan all commits for Fixes: tags that do not have cc: stable, and
> > > try to pick them up when we can and have the time to do so.  But it's
> > > not guaranteed at all that this will happen.
> > > 
> > > I don't know why people keep getting confused about this, we don't
> > > document the "Fixes: means it goes to stable" anywhere...
> > 
> > Except that is exactly what happens, sometimes within a day of two
> > of a patch with a Fixes tag hitting Linus' kernel. We have had a
> > serious XFS regression in the 5.9.9 stable kernel that should never
> > have happened as a result of exactly this "Fixes = automatically
> > swept immediately into stable kernels" behaviour. See here for
> > post-mortem analysis:
> > 
> > https://lore.kernel.org/linux-xfs/20201126071323.GF2842436@xxxxxxxxxxxxxxxxxxx/T/#m26e14ebd28ad306025f4ebf37e2aae9a304345a5
> > 
> > This happened because these auotmated Fixes scans seem to occur
> > weekly during -rcX release periods, which means there really is *no
> > practical difference* between the way the stable process treats
> > Fixes tags and cc: stable.
> 
> Sometimes, yes, that is true.  But as it went into Linus's tree at the
> same time, we just ended up with "bug compatible" trees :)
> 
> Not a big deal overall, happens every few releases, we fix it up and
> move on.  The benifits in doing all of this _FAR_ outweigh the very
> infrequent times that kernel developers get something wrong.

I'm not debating that users benefit from backports. I'm talking
about managing risk profiles and how to prevent an entirely
preventable stable kernel regression from happening again.

Talking about risk profiles, the issue here is that the regression
that slipped through to the stable kernels had a -catastrophic- risk
profile. That's exactly the sort of things that the stable kernel is
supposed to avoid exposing users to, and that raises the importance
and priority of ensuring that *never happens again*.

And the cause of this regression slipping through to stable kernel
users? It was a result of the automated "fixes" scan done by the
stable process that results in "fixes" meaning the same thing as
"cc: stable"....

> As always, if you do NOT want your subsystem to have fixes: tags picked
> up automatically by us for stable trees, just email us and let us know
> to not do that and we gladly will.

No, that is not an acceptible solution for anyone. The stable
maintainers need to stop suggesting this as a solution to any
criticism that is raised against the stable process. You may as well
just say "shut up, go away, we don't care what you want".

> > It seems like this can all be avoided simply by scheduling the
> > automated fixes scans once the upstream kernel is released, not
> > while it is still being stabilised by -rc releases. That way stable
> > kernels get better tested fixes, they still get the same quantity of
> > fixes, and upstream developers have some margin to detect and
> > correct regressions in fixes before they get propagated to users.
> 
> So the "magic" -final release from Linus would cause this to happen?
> That means that the world would go for 3 months without some known fixes
> being applied to the tree?  That's not acceptable to me, as I started
> doing this because it was needed to be done, not just because I wanted
> to do more work...

I'm not suggesting that all fixes across the entire kernel get held
until release. That's just taking things to extremes for no valid
reason as the risk profiles of most subsystems don't justify needing
a margin of error that large. I'm asking that specific subsystems
with catastrophic failure risk profiles be allowed to opt
out of the "just merged" fixes scans and instead have them replaced
by a less frequent scan.

Perhaps we don't even need to wait for the full release. Maybe just
increasing the fixes scanning window for those subsystems to pick up
changes in -rc(X-2) so that the commits have been exposed to testing
for a couple of weeks before being considered a stable backport
candidate. 

That mitigates the immediate risk concern as it gives developers
time to catch and fix regressions before stable backports are done.
Such a 2 week delay would have avoided exposing stable kernel users
to dangerous regression that should never have been released outside
developer and test machines exercising the upstream -rcX tree.

> > It also creates a clear demarcation between fixes and cc: stable for
> > maintainers and developers: only patches with a cc: stable will be
> > backported immediately to stable. Developers know what patches need
> > urgent backports and, unlike developers, the automated fixes scan
> > does not have the subject matter expertise or background to make
> > that judgement....
> 
> Some subsystems do not have such clear demarcation at all. Heck, some
> subsystems don't even add a cc: stable to known major fixes.  And that's
> ok, the goal of the stable kernel work is to NOT impose additional work
> on developers or maintainers if they don't want to do that work.

Engineering is as much about improving processes as it is about
improving the thing that is being built.  I'm not asking you to stop
backporting fixes or stop improving the stable kernels. All I'm
asking for is to increase the latency of backports for some
subsystems because a margin of error is needed to minimise the risk
profile stable users are exposed to. IOWs, I'm asking for a *minor
tweak* to the existing process, not asking you to start all over
again.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx