On Wed, Nov 02, 2022 at 07:59:46PM +0530, Srikanth C S wrote: > After a recent data center crash, we had to recover root filesystems > on several thousands of VMs via a boot time fsck. Since these > machines are remotely manageable, support can inject the kernel > command line with 'fsck.mode=force fsck.repair=yes' to kick off > xfs_repair if the machine won't come up or if they suspect there > might be deeper issues with latent errors in the fs metadata, which > is what they did to try to get everyone running ASAP while > anticipating any future problems. But, fsck.xfs does not address the > journal replay in case of a crash. > > fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is > possible that when the machine crashes, the fs is in inconsistent > state with the journal log not yet replayed. This can put the > machine into rescue shell. To address this problem, mount and > umount the fs before running xfs_repair. > > Run xfs_repair -e when fsck.mode=force and repair=auto or yes. > Replay the logs only if fsck.mode=force and fsck.repair=yes. For > other option -fa and -f drop to the resuce shell if repair detects > any corruptions > > Signed-off-by: Srikanth C S <srikanth.c.s@xxxxxxxxxx> > --- > fsck/xfs_fsck.sh | 23 +++++++++++++++++++++-- > 1 file changed, 21 insertions(+), 2 deletions(-) > > diff --git a/fsck/xfs_fsck.sh b/fsck/xfs_fsck.sh > index 6af0f22..4ef61db 100755 > --- a/fsck/xfs_fsck.sh > +++ b/fsck/xfs_fsck.sh > @@ -31,10 +31,12 @@ repair2fsck_code() { > > AUTO=false > FORCE=false > +REPAIR=false > while getopts ":aApyf" c > do > case $c in > - a|A|p|y) AUTO=true;; > + a|A|p) AUTO=true;; > + y) REPAIR=true;; > f) FORCE=true;; > esac > done > @@ -64,7 +66,24 @@ fi > > if $FORCE; then > xfs_repair -e $DEV > - repair2fsck_code $? > + error=$? > + if [ $error -eq 2 ] && [ -n "$REPAIR" ]; then > + echo "Replaying log for $DEV" > + mkdir -p /tmp/repair_mnt || exit 1 > + for x in $(cat /proc/cmdline); do > + case $x in > + rootflags=*) > + ROOTFLAGS="-o ${x#rootflags=}" > + ;; What if fsck is being called on all devices (i.e. -a) or something other than the root device? Don't we have to match the root flags to the root dev? It's likely that there will be a root=<dev> parameter on the CLI, so we'd want to grab that and check that it matches $DEV before using ROOTFLAGS, right? Otherwise this looks OK. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx