Re: [PATCH 2/2 V2] xfs: toggle readonly state around xfs_log_mount_finish

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Mon, 27 Mar 2017 10:16:10 -0700

On Sat, Mar 18, 2017 at 06:38:35PM +1100, Dave Chinner wrote:
> On Thu, Mar 16, 2017 at 04:52:43PM -0700, Eric Sandeen wrote:
> > On 3/16/17 4:42 PM, Dave Chinner wrote:
> > > On Thu, Mar 16, 2017 at 12:15:00PM -0700, Darrick J. Wong wrote:
> > >> On Wed, Mar 15, 2017 at 07:36:29AM -0400, Brian Foster wrote:
> > >>> On Tue, Mar 14, 2017 at 06:23:57PM -0500, Eric Sandeen wrote:
> > >>>> When we do log recovery on a readonly mount, unlinked inode
> > >>>> processing does not happen due to the readonly checks in
> > >>>> xfs_inactive(), which are trying to prevent any I/O on a
> > >>>> readonly mount.
> > >>>>
> > >>>> This is misguided - we do I/O on readonly mounts all the time,
> > >>>> for consistency; for example, log recovery.  So do the same
> > >>>> RDONLY flag twiddling around xfs_log_mount_finish() as we
> > >>>> do around xfs_log_mount(), for the same reason.
> > >>>>
> > >>>> This all cries out for a big rework but for now this is a
> > >>>> simple fix to an obvious problem.
> > >>>>
> > >>>> Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx>
> > >>>> ---
> > >>>>
> > >>
> > >> Both patches look ok, so I'll put them on the test queue for -rc4.
> > >> Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > 
> > > FWIW, I don't think this is a -rc candidate. Making log recovery
> > > process unlinked inode transactions on read-only mounts is a pretty
> > > major change in behaviour. Who knows exactly what dragons are
> > > lurking at lower layers that have never been run in this context
> > > until now.
> > > 
> > > Also, it's not urgent - we've lived with this behaviour for years -
> > > so waiting a month for the next merge window is not going to hurt
> > > anyone and it gives us a chance to test it - XFS developers are the
> > > people who should be burnt by the lurking dragons, not users who
> > > updated to a late -rcX kernel....
> > 
> > To shield Darrick a bit ;) I was agitating/asking for sooner, but
> > admittedly that was a little bit selfish on my part.
> > 
> > Still, we have had field reports of people with /gigabytes/ missing
> > from the root filesystem, and it was not fixable without an 
> > xfs_repair.  Which on a root filesystem is ... special.
> 
> That's information that should be in the commit message....
> 
> > So, my fault for getting it sent late, for sure - but I do think it's
> > an important fix.  I know we can't really address the "unknown unknown"
> > dragons easily, but actually completing recovery on RO mounts seems
> > straightforward to me... we allow half of recovery to go, and
> > disallow the other half.  Seems plainly broken.
> 
> I still don't think that makes it an urgent, immediate -rcX fix.  It
> definitely makes it a fix that should go to stable kernels, but that
> does not mean we should short-cut our integrationa nd testing
> processes. If anything, it makes it far more important to ensure the
> change is safe and well tested, because it's going to be distributed
> to /everyone/ in the near future through the stable update process,
> distros included.
> 
> As I've already said: rushing fixes upstream without adequate test
> time is almost always the wrong thing to do. Call me conservative,
> but I have plenty of scars to justify being careful about pushing
> fixes too quickly.
> 
> I'm more worried about the impact on the unknown number of read-only
> filesystems out there across the entire userbase that have the
> potential to process inodes that have been sitting orphaned for
> years than I am about the few recent users who have had to run
> xfs-repair on their root filesystem to fix this up due to the nature
> of ro->rw transition in root filesystem mounting.  Let's make really
> sure everything is OK before we expose it to all our users running
> stable/distro kernels....

FWIW I let this run w/ all my testing configs during LSF/Vault last week
and I didn't see any new failures.  I'll hold off on sending these patches.

But, waiting for 4.12 does provide the opportunity to add more stressful
tests than what generic/417 does now.  How about a test that creates a
big directory structure + some heavily fragmented files, then opens all
of those files, deletes the directory tree, shuts down the fs, then
attempts a ro mode recovery?  That way we have a lot of files and a lot
of bmap records to get rid of during mount.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html