On Tue, Oct 11, 2022 at 03:26:02PM +1100, Dave Chinner wrote: > On Mon, Oct 10, 2022 at 05:30:48PM -0700, Darrick J. Wong wrote: > > On Tue, Oct 11, 2022 at 10:29:32AM +1100, Dave Chinner wrote: > > > On Mon, Oct 10, 2022 at 03:40:51PM +0000, Darrick Wong wrote: > > > > LGTM, want to send this to the upstream list to start that discussion? > > > > UGH, so I thought this was an internal thread, but it turns out that > > linux-xfs has been cc'd for a while but none of the messages made it to > > lore. > > > > I'll fill in some missing context below. > > > > > > --D > > > > > > > > ________________________________________ > > > > From: Srikanth C S <srikanth.c.s@xxxxxxxxxx> > > > > Sent: Monday, October 10, 2022 08:24 > > > > To: linux-xfs@xxxxxxxxxxxxxxx; Darrick Wong > > > > Cc: Rajesh Sivaramasubramaniom; Junxiao Bi > > > > Subject: [PATCH] fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair > > > > > > > > fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is > > > > possible that when the machine crashes, the fs is in inconsistent > > > > state with the journal log not yet replayed. This can put the > > > > machine into rescue shell. To address this problem, mount and > > > > umount the fs before running xfs_repair. > > > > > > What's the purpose of forcing xfs_repair to be run on every boot? > > > The whole point of having a journalling filesystem is to avoid > > > needing to run fsck on every boot. > > > > I don't think repair-at-every-boot here is the overall goal for our > > customer base. > > > > We've had some <cough> major support events over the last few months. > > There are two things going on here: some of the problems are due to are > > datacenters abending, and the rest of it are the pile of data corruption > > /me has to look up what "abending" means in this context, because I > don't think you mean that the data denter was "Absent By Enforced > Net Deprivation"... > > Ah, it's an ancient abbreviation from the early days of IBM > mainframes meaning "abnormal end of task". i.e. something crashed. Yes. > > problems that you and I and Chandan have been working through for months > > now. > > > > This means that customer support has 30,000 VMs to reboot. 90% of the > > systems involved seem to have survived more or less intact, but that > > leaves 10% of them with latent errors, unmountable root filesystems, > > etc. They probably have even more than that, but I don't recommend > > inviting the G-men for a visit to find out the real sum. > > > > Since these machines are remotely manageable, support /can/ inject the > > kernel command line with 'fsck.mode=force' to kick off xfs_repair if the > > machine won't come up or if they suspect there might be deeper issues > > with latent errors in the fs metadata, which is what they did to try to > > get everyone running ASAP while anticipating any future problems... > > This context is kinda important :) > > [....] > > > > More explanation, please! > > > > Frankly, I /don't/ want to expend a lot of time wringing our hands over > > how exactly do we hammer 2022 XFS tools into 1994 ext2 behavioral > > semantics. > > Neither do I - I just want to understand the problem that the > change is trying to solve. The commit message gave no indication of > why the change was being proposed; commit messages need to tell the > whole story, not describe the code change being made. If it started > like: > > "After a recent data center crash, we recently had to recover root > filesystems on several thousand VMs via a boot time fsck. Many of > them failed unnecessarily because.... " > > then it becomes self evident that replaying the log in these > situations is necessary. <nod> Srikanth, would you reword the patch message to include some of the details about why we want this patch? > > What they really want is online fsck to detect and fix problems in real > > time, but I can't seem to engage the community on how exactly do we land > > this thing now that I've finished writing it. > > I only want to look at the interfacing with the runtime structures > and locking to ensure we don't back ourselves into a nasty corner. > This is really only a small part of the bigger online repair > patchset. Once we've got that sorted, I think we should just commit > the rest of it as it stands (i.e. tested but unreviewed) and fix as > needed, just like we do with the userspace xfs_repair code... Ok. > > --D > > > > > > Signed-off-by: Srikanth C S <srikanth.c.s@xxxxxxxxxx> > > > > --- > > > > fsck/xfs_fsck.sh | 20 ++++++++++++++++++-- > > > > 1 file changed, 18 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh > > > > index 6af0f22..21a8c19 100755 > > > > --- a/fsck/xfs_fsck.sh > > > > +++ b/fsck/xfs_fsck.sh > > > > @@ -63,8 +63,24 @@ if [ -n "$PS1" -o -t 0 ]; then > > > > fi > > > > > > > > if $FORCE; then > > > > - xfs_repair -e $DEV > > > > - repair2fsck_code $? > > > > + if $AUTO; then > > > > + xfs_repair -e $DEV > > > > + error=$? > > > > + if [ $error -eq 2 ]; then To reiterate something earlier in the thread: I would only do the mount/umount if the caller passes -fy. I would drop to the emergency shell for -fa or -f. (Also: whitespace damage) > > > > + echo "Replaying log for $DEV" > > > > + mkdir -p /tmp/tmp_mnt > > > > + mount $DEV /tmp/tmp_mnt > > Need to handle mount failure here - this will need to drop to > a rescue shell for manual recovery, I think. Yep. > Also, what happens if there are mount options set like quota? > Doesn't this make more work for the recovery process if we elide > things like that here? Agreed, I think we need to extract rootflags= from /proc/cmdline. > > > > + umount /tmp/tmp_mnt > > > > + xfs_repair -e $DEV > > > > + error=$? > > > > + rmdir /tmp/tmp_mnt > > > > + fi > > > > + else > > > > + #fsck.mode=force is set but fsck.repair=no > > > > + xfs_repair -n $DEV > > > > + error=$? > > > > + fi > > I think that adding a "check only" mode should be a separate patch. > We have to consider different things here, such as it will run on an > unrecovered filesystem and potentially report it damaged when, in > fact, all that is needed is log recovery to run.... <nod> I'm not sure what we should return in that case. Maybe 8 for operational error? --D > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx