Re: Automated fsck on boot

Max Vozeler <max@xxxxxxxxxxxx> · Sun, 9 Apr 2006 15:01:54 +0200

Hi Jari,

On Sun, Apr 09, 2006 at 11:14:55AM +0300, Jari Ruusu wrote:
> Your version compared device (or file) names. Comparing names is
> problematic because they may be truncated or they may use relative
> paths (not begin with slash). I changed it to compare device number
> and inode number of the block special file (or normal file) to those
> recorded in loop device.
>
> # losetup /dev/loop6
> /dev/loop6: [0902]:213045 (/dev/md5) offset=4096 encryption=AES128 multi-key-v3
>              ^^^^  ^^^^^^
>               |      |
>               |    Inode number inside my root file system where static
>               |    block special node /dev/md5 happens to reside.
>               |
>              Device number of my root file system where static
>              block special node /dev/md5 happens to reside.

Ah, I see. That makes a lot of sense - I suppose someone could also
have replaced a loop file with a different file of the same name, or
udev could have been renaming a device node. In both cases comparing
only filenames would give a wrong result.

> I merged your patch, but now the code looks like this:
> 
> int is_loop_active(const char *dev, const char *backdev)
> {
> 	int fd;
> 	int ret = 0;
> 	struct stat statbuf;
> 	struct loop_info64 loopinfo;
> 	if (stat (dev, &statbuf) == 0 && S_ISBLK(statbuf.st_mode)) {
> 		fd = open (dev, O_RDONLY);
> 		if (fd < 0)
> 			return 0;
> 		if ((loop_get_status64_ioctl(fd, &loopinfo) == 0)
> 		    && (stat (backdev, &statbuf) == 0)
> 		    && (statbuf.st_dev == loopinfo.lo_device)
> 		    && (statbuf.st_ino == loopinfo.lo_inode))
> 			ret = 1; /* backing device matches */
> 		memset(&loopinfo, 0, sizeof(loopinfo));
> 		close(fd);
> 	}
> 	return ret;
> }

Works perfectly for me. Thanks a lot!

I'm attaching a preliminary version of the rcS.d init script I'm
planning to ship in the Debian package loop-aes-utils.

I'm undecided about whether to enable the init script for existing
setups. It is safer for every filesystem to be fscked regularily,
but then someone might have a loop device with a file system on it
that shows uncorrectable errors - this could cause the boot to abort
and drop to sulogin for manual correction.  Or there might be an
existing script that does losetup -F and fsck which could break if
this script runs before and leaves the loop allocated. 

What do Debian users here think? Other thoughts or experiences? 
I currently tend to leave it disabled by default and document in
README.Debian that one needs to enable it manually by editing a 
flag in /etc/default/checkfs-loop (or so).

cheers,
Max
#!/bin/sh
### BEGIN INIT INFO
# Provides:          checkfs-loop
# Required-Start:    checkroot
# Required-Stop:
# Should-Start:      udev devfsd raid2 mdadm lvm
# Should-Stop:
# Default-Start:     S
# Default-Stop:
# Short-Description: Check loop-encrypted filesystems.
### END INIT INFO
#
# NOTE: This script duplicates much of checkfs.sh as we need to
# work on the decrypted loop devices, which fsck -A doesn't know
# about. The maintainer of this script should track changes in
# checkfs.sh and make sure they are applied.

PATH=/sbin:/bin
FSCK_LOGFILE=/var/log/fsck/loop
[ "$FSCKFIX" ] || FSCKFIX=no
. /lib/init/vars.sh
. /lib/lsb/init-functions

list_fsck_loops () {
	grep -v '^#' /etc/fstab |
	while read dev mnt fstype opts freq passno; do
		fsck=yes
		loopdev=

		for opt in $(IFS=, && echo $opts)
		do
			case $opt in
			noauto|sw)
				fsck=no
				;;
			loop=/dev/loop*)
				loopdev=${opt#loop=}
				;;
			esac
		done

		if [ -z "$loopdev" ] || [ "$fsck" = no ]
		then
			continue
		fi

		echo $loopdev:$mnt
	done
}

retries=3

do_losetup () {
	loop=${1%:*}
	mnt=${1#*:}
	try=0
	log_action_msg "Setting up $loop ($mnt)"
	while [ $try -lt $retries ]
	do
		if losetup -F $loop
		then
			return 0
		fi
		try=$((try+1))
	done
	return 1
}

# TODO We can't do anything about FSCKTYPES settings other than "none"

do_fsck () {
	loopdevs="$@"

	# See if we're on AC Power
	# If not, we're not gonna run our check
	if which on_ac_power >/dev/null 2>&1
	then
		on_ac_power >/dev/null 2>&1
		if [ $? -eq 1 ]
		then
			[ "$VERBOSE" = no ] || log_success_msg "Running on battery power, so skipping loop file system check."
			BAT=yes
		fi
	fi

	#
	# Check loop-encrypted file systems.
	#
	if [ ! -f /fastboot ] && [ ! "$BAT" ] && [ "$FSCKTYPES" != "none" ]
	then
		if [ -f /forcefsck ]
		then
			force="-f"
		else
			force=""
		fi
		if [ "$FSCKFIX" = yes ]
		then
			fix="-y"
		else
			fix="-a"
		fi
		spinner="-C"
		case "$TERM" in
		  dumb|network|unknown|"")
			spinner=""
			;;
		esac
		[ "$(uname -m)" = s390 ] && spinner=""  # This should go away
		handle_failed_fsck() {
			log_failure_msg "File system check failed. 
A log is being saved in ${FSCK_LOGFILE} if that location is writable. 
Please repair the file system manually."
			log_warning_msg "A maintenance shell will now be started. 
CONTROL-D will terminate this shell and resume system boot."
			# Start a single user shell on the console
			if ! sulogin $CONSOLE
			then
				log_failure_msg "Attempt to start maintenance shell failed. 
Continuing with system boot in 5 seconds."
				sleep 5
			fi
		}

		failed=
		for device in $loopdevs
		do
			if [ "$VERBOSE" = no ]
			then
				logsave -s $FSCK_LOGFILE fsck $spinner $fix $force $device
				FSCKCODE=$?
				if [ "$FSCKCODE" -gt 1 ]
				then
					failed=1
				fi
			else
				logsave -s $FSCK_LOGFILE fsck $spinner -V $fix $force $device
				FSCKCODE=$?
				if [ "$FSCKCODE" -gt 1 ]
				then
					failed=1
				fi
			fi
		done

		if [ "$failed" ]
		then
			handle_failed_fsck
		else
			if [ "$VERBOSE" = yes ]
			then
				log_success_msg "Done checking loop-encrypted file systems. 
A log is being saved in ${FSCK_LOGFILE} if that location is writable."
			fi
		fi
	fi
	# Do not delete those, we are running before checkfs and it will
	# still need them
	#rm -f /fastboot /forcefsck
}

do_start () {
	log_action_msg "Checking loop-encrypted file systems"

	check_loops=
	for device in $(list_fsck_loops)
	do
		if do_losetup $device
		then
			check_loops="$check_loops ${device%:*}"
		fi
	done
	do_fsck $check_loops
}

case "$1" in
  start|"")
	do_start
	;;
  restart|reload|force-reload)
	echo "Error: argument '$1' not supported" >&2
	exit 3
	;;
  stop)
	# No-op
	;;
  *)
	echo "Usage: checkfs-loop.sh [start|stop]" >&2
	exit 3
	;;
esac

: