> -----Original Message----- > From: Carlos Maiolino <cem@xxxxxxxxxx> > Sent: 23 November 2022 05:53 PM > To: Srikanth C S <srikanth.c.s@xxxxxxxxxx> > Cc: linux-xfs@xxxxxxxxxxxxxxx; Darrick Wong <darrick.wong@xxxxxxxxxx>; > Rajesh Sivaramasubramaniom <rajesh.sivaramasubramaniom@xxxxxxxxxx>; > Junxiao Bi <junxiao.bi@xxxxxxxxxx>; david@xxxxxxxxxxxxx > Subject: Re: [External] : Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to > replay log before running xfs_repair > > On Wed, Nov 23, 2022 at 11:40:53AM +0000, Srikanth C S wrote: > > Hi > > > > I resent the same patch as I did not see any review comments. > > Unless I'm looking at the wrong patch, there were comments on your > previous > submission: > > https://urldefense.com/v3/__https://lore.kernel.org/linux- > xfs/Y2ie54fcHDx5bcG4@B-P7TQMD6M- > 0146.local/T/*t__;Iw!!ACWV5N9M2RV99hQ!J2Z- > 2NThyyDm__z9ivhioF9QoHsaHh4Tk733jtNbVMPGeA2vbmbw3h4ZGxOywQF > v_lA1Zs_jsUgr$ > > Am I missing something? All the previous comments addressing this patch were about having journal replay code in the userspace. But Darricks comments indicate that this requires making the log endian safe because of kernel's inability to recover a log from a platform with a different endianness. So I am still wondering on how to proceed with this patch. Any comments would be helpful. > > Also, if you are sending the same patch, you can 'flag' it as a resend, so, it's > easier to identify you are simply resending the same patch. You can do it by > appending/prepending 'RESEND', to the patch tag: > > [RESEND PATCH] <subject> Thanks for the info. Didn't know this. > > Cheers. > > > > > -Srikanth > > > > > __________________________________________________________ > ________ > > > > From: Carlos Maiolino <cem@xxxxxxxxxx> > > Sent: Wednesday, November 23, 2022 2:06 PM > > To: Srikanth C S <srikanth.c.s@xxxxxxxxxx> > > Cc: linux-xfs@xxxxxxxxxxxxxxx <linux-xfs@xxxxxxxxxxxxxxx>; Darrick Wong > > <darrick.wong@xxxxxxxxxx>; Rajesh Sivaramasubramaniom > > <rajesh.sivaramasubramaniom@xxxxxxxxxx>; Junxiao Bi > > <junxiao.bi@xxxxxxxxxx>; david@xxxxxxxxxxxxx > <david@xxxxxxxxxxxxx> > > Subject: [External] : Re: [PATCH v3] fsck.xfs: mount/umount xfs fs to > > replay log before running xfs_repair > > > > Hi. > > Did you plan to resend V3 again, or is this supposed to be V4? > > On Wed, Nov 23, 2022 at 12:00:50PM +0530, Srikanth C S wrote: > > > After a recent data center crash, we had to recover root filesystems > > > on several thousands of VMs via a boot time fsck. Since these > > > machines are remotely manageable, support can inject the kernel > > > command line with 'fsck.mode=force fsck.repair=yes' to kick off > > > xfs_repair if the machine won't come up or if they suspect there > > > might be deeper issues with latent errors in the fs metadata, which > > > is what they did to try to get everyone running ASAP while > > > anticipating any future problems. But, fsck.xfs does not address the > > > journal replay in case of a crash. > > > > > > fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is > > > possible that when the machine crashes, the fs is in inconsistent > > > state with the journal log not yet replayed. This can drop the > > machine > > > into the rescue shell because xfs_fsck.sh does not know how to clean > > the > > > log. Since the administrator told us to force repairs, address the > > > deficiency by cleaning the log and rerunning xfs_repair. > > > > > > Run xfs_repair -e when fsck.mode=force and repair=auto or yes. > > > Replay the logs only if fsck.mode=force and fsck.repair=yes. For > > > other option -fa and -f drop to the rescue shell if repair detects > > > any corruptions. > > > > > > Signed-off-by: Srikanth C S <srikanth.c.s@xxxxxxxxxx> > > > --- > > > fsck/xfs_fsck.sh | 31 +++++++++++++++++++++++++++++-- > > > 1 file changed, 29 insertions(+), 2 deletions(-) > > > > > > diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh > > > index 6af0f22..62a1e0b 100755 > > > --- a/fsck/xfs_fsck.sh > > > +++ b/fsck/xfs_fsck.sh > > > @@ -31,10 +31,12 @@ repair2fsck_code() { > > > > > > AUTO=false > > > FORCE=false > > > +REPAIR=false > > > while getopts ":aApyf" c > > > do > > > case $c in > > > - a|A|p|y) AUTO=true;; > > > + a|A|p) AUTO=true;; > > > + y) REPAIR=true;; > > > f) FORCE=true;; > > > esac > > > done > > > @@ -64,7 +66,32 @@ fi > > > > > > if $FORCE; then > > > xfs_repair -e $DEV > > > - repair2fsck_code $? > > > + error=$? > > > + if [ $error -eq 2 ] && [ $REPAIR = true ]; then > > > + echo "Replaying log for $DEV" > > > + mkdir -p /tmp/repair_mnt || exit 1 > > > + for x in $(cat /proc/cmdline); do > > > + case $x in > > > + root=*) > > > + ROOT="${x#root=}" > > > + ;; > > > + rootflags=*) > > > + ROOTFLAGS="-o > > ${x#rootflags=}" > > > + ;; > > > + esac > > > + done > > > + test -b "$ROOT" || ROOT=$(blkid -t "$ROOT" -o device) > > > + if [ $(basename $DEV) = $(basename $ROOT) ]; then > > > + mount $DEV /tmp/repair_mnt $ROOTFLAGS || exit > > 1 > > > + else > > > + mount $DEV /tmp/repair_mnt || exit 1 > > > + fi > > > + umount /tmp/repair_mnt > > > + xfs_repair -e $DEV > > > + error=$? > > > + rm -d /tmp/repair_mnt > > > + fi > > > + repair2fsck_code $error > > > exit $? > > > fi > > > > > > -- > > > 1.8.3.1 > > -- > > Carlos Maiolino > > -- > Carlos Maiolino Regards, Srikanth