On Fri, Mar 09, 2018 at 01:40:01PM -0700, Andreas Dilger wrote: > On Mar 9, 2018, at 1:18 PM, Lukas Czerner <lczerner@xxxxxxxxxx> wrote: > > > > On Fri, Mar 09, 2018 at 12:40:17PM -0700, Andreas Dilger wrote: > >> On Mar 9, 2018, at 6:31 AM, Lukas Czerner <lczerner@xxxxxxxxxx> wrote: > >>> > >>> On Thu, Mar 08, 2018 at 12:13:27PM -0800, Darrick J. Wong wrote: > >>>> From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > >>>> > >>>> Implement online fsck for ext* filesystems which live on LVM-managed > >>>> logical volumes. The basic strategy mirrors that of e2croncheck -- > >>>> create a snapshot, fsck the snapshot, report whatever errors appear, > >>>> remove snapshot. Unlike e2croncheck, this utility accepts any LVM > >>>> device path, knows about snapshots running out of space, and can call > >>>> fstrim having validated that the fs metadata is ok. > >>> > >>> Thanks Darrick this looks good, I have couple of comments below. Also > >>> it's not a big deal, but I have a slight issue with the name e2scrub. > >>> What we will call it if/when we have a real in-kernel fs scrub ? > >> > >> I'd think in that case the e2scrub script will call the in-kernel version > >> of the scrub? That might even be "e2fsck" just calling an ioctl, like > >> resize2fs decides whether to do an online or offline resize depending if > >> the filesystem is mounted or not. > > > > Yeah, I suppose. > > > >> > >>>> diff --git a/scrub/e2scrub.8.in b/scrub/e2scrub.8.in > >>>> new file mode 100644 > >>>> index 0000000..7cb6dfb > >>>> --- /dev/null > >>>> +++ b/scrub/e2scrub.8.in > >>>> @@ -0,0 +1,34 @@ > >>>> +.TH E2SCRUB 8 "@E2FSPROGS_MONTH@ @E2FSPROGS_YEAR@" "E2fsprogs version @E2FSPROGS_VERSION@" > >>>> +.SH NAME > >>>> +e2scrub - check a mounted ext2/ext3/ext4 file system on an LVM volume for errors. > >>>> +.SH SYNOPSYS > >>>> +.B > >>>> +e2scrub [OPTION] [LVM DEVICE PATH] > >>>> +.SH DESCRIPTION > >>>> +Given a live file system on a LVM volume, this program snapshots the > >>>> +logical volume and runs a file system check to look for serious errors. > >>>> +If no errors are found, fstrim can be called on the mounted file system. > >>>> +However, if errors are found, the file system should be unmounted and > >>>> +fixed. > >>> > >>> e2scrub will _not_ fix you file system, I think we should make this part > >>> really clear. > >>> > >>> I am missing some more information on what are the requirements for this to > >>> work - like free space in the vg. > >>> > >>> How about few words about disaster recovery, I know you're trying to > >>> always remove the snapshot, but what if it fails for whatever reason ? I > >>> think there are things users should know about so that they can help > >>> themselves. > >> > >> Later on in the patch series there is a command to reap stale snapshots > >> via cron or systemd once a day. > > > > True, but it might not be in place and regardless of how it's set up it > > usefull to document what do to if anything goes wrong. As it is now the > > documentation is really light on details of what is it actually doing. > > Yeah, the lvscan script that I had would delete the old snapshot if it > found one, after checking that there was not an older scan still running > on the device. > > One option would be to consolidate the "e2scrub_reap" functionality into > e2scrub itself, and then allow "e2scrub -n" or "e2scrub --reap" to be > used to only do the cleanup? That way, at least the next e2scrub run > will clean up the old snapshot rather than generating a useless error, > or running a check on an old (stale) copy of the snapshot. Yes, I would like the option to run e2scrub to do the cleanup. -Lukas > > That would also mean one less script to be installed and documented. > > Cheers, Andreas > > >>> Also in case this is run on dm-thin lv it's worth mentioning that lvm > >>> will use the "old" snaphost instead if the thin snapshot. However this > >>> is something we can recognize and use the right command. > >>> > >>>> +.SH OPTIONS > >>>> +.TP > >>>> +\fB-t\fR > >>>> +Run > >>>> +.B > >>>> +fstrim(1) > >>>> +on the mounted filesystem if no errors are found. > >>>> +.TP > >>>> +\fB-V\fR > >>>> +Print version information and exit. > >>>> +.SH EXIT CODE > >>>> +The exit codes are the same as in > >>>> +.BR e2fsck (8) > >>>> +.SH BUGS > >>>> +This utility is capable of checking any ext* filesystem on an LVM volume, > >>>> +regardless of whether it is mounted. > >>> > >>> If it's not mounted it'll still use lvm snapshots right ? It's not > >>> necessarily clear from the documentation. > >> > >> I think yes, which is safest in case the filesystem suddenly is mounted > >> in the middle of the scrub. > > > > Definitelly, but that needs to be documented. > > > >> > >>>> +.SH SEE ALSO > >>>> +.BR e2fsck (8) > >>>> +.SH AUTHOR > >>>> +Darrick J. Wong <darrick.wong@xxxxxxxxxx> > >>>> +.SH COPYRIGHT > >>>> +Copyright ©2018 Darrick J. Wong. License is GPLv2+. <http://www.gnu.org/licenses/gpl-2.0.html> > >>>> diff --git a/scrub/e2scrub.conf.in b/scrub/e2scrub.conf.in > >>>> new file mode 100644 > >>>> index 0000000..fec828a > >>>> --- /dev/null > >>>> +++ b/scrub/e2scrub.conf.in > >>>> @@ -0,0 +1,10 @@ > >>>> +# e2scrub configuration file > >>>> + > >>>> +# Snapshots will be created to run fsck; the snapshot will be of this size. > >>>> +# snap_size_mb=256 > >>> > >>> Do you have any idea what it takes to fill this up ? What kind of > >>> workload for how long ? Why do you think it's a sane default ? > >>> > >>> I am well aware that there is no right answer I am just curious. > >> > >> I've used 256MB in my lvcheck scripts for years without issues. It > >> really depends on how long the e2fsck takes and how much churn there > >> is in the filesystem during that time. Reserving a bit too much is not > >> really harmful, since the space is only used temporarily. > > > > Good to know. > > > >> > >> What might be difficult is that users will typically consume all of > >> the space in the VG from the beginning. It would be great if distros > >> reserved a few hundred MB in the VG at installation time for this. > > > > That's very much true, I think lvm people can tell a lot of stories about > > this one :) > > > >> > >>>> diff --git a/scrub/e2scrub.in b/scrub/e2scrub.in > >>>> new file mode 100644 > >>>> index 0000000..e26b449 > >>>> --- /dev/null > >>>> +++ b/scrub/e2scrub.in > >>>> @@ -0,0 +1,162 @@ > >>>> +#!/bin/bash > >>>> + > >>>> +# Copyright (C) 2018 Oracle. All Rights Reserved. > >>>> +# > >>>> +# Author: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > >>>> +# > >>>> +# This program is free software; you can redistribute it and/or > >>>> +# modify it under the terms of the GNU General Public License > >>>> +# as published by the Free Software Foundation; either version 2 > >>>> +# of the License, or (at your option) any later version. > >>>> +# > >>>> +# This program is distributed in the hope that it would be useful, > >>>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of > >>>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > >>>> +# GNU General Public License for more details. > >>>> +# > >>>> +# You should have received a copy of the GNU General Public License > >>>> +# along with this program; if not, write the Free Software Foundation, > >>>> +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. > >>>> + > >>>> +# Automatically check a LVM-managed filesystem online. > >>>> +# We use lvm snapshots to do this, which means that we can only > >>>> +# check filesystems in VGs that have at least 256mb (or so) of > >>>> +# free space. > >>>> + > >>>> +snap_size_mb=256 > >>>> +fstrim=0 > >>>> +e2fsck_opts="" > >>>> +conffile="@root_sysconfdir@/e2scrub.conf" > >>>> + > >>>> +test -f "${conffile}" && . "${conffile}" > >>>> + > >>>> +print_help() { > >>>> + echo "Usage: $0 [-t] device" > >>>> + echo > >>>> + echo "device must be a LVM-managed block device" > >>>> + echo "-t: Run fstrim if successful." > >>>> +} > >>>> + > >>>> +print_version() { > >>>> + echo "e2scrub @E2FSPROGS_VERSION@ (@E2FSPROGS_DATE@)" > >>>> +} > >>>> + > >>>> +while getopts "tV" opt; do > >>>> + case "${opt}" in > >>>> + "t") fstrim=1;; > >>>> + "V") print_version; exit 0;; > >>>> + *) print_help; exit 2;; > >>>> + esac > >>>> +done > >>>> +shift "$((OPTIND - 1))" > >>>> + > >>>> +dev="$1" > >>>> +if [ -z "${dev}" ]; then > >>>> + print_help > >>>> + exit 1 > >>>> +elif [ ! -b "${dev}" ]; then > >>>> + echo "${dev}: Not a block device?" > >>>> + print_help > >>>> + exit 16 > >>>> +fi > >>>> + > >>>> +# Make sure this is an LVM device we can snapshot > >>>> +vg="$(lvs --noheadings -o vg_name "${dev}" 2> /dev/null | sed -e 's/^ //g')" > >>>> +lv="$(lvs --noheadings -o lv_name "${dev}" 2> /dev/null | sed -e 's/^ //g')" > >>> > >>> You could use > >>> > >>> eval $(lvs --nameprefixes -o name,vgname --noheadings "${dev}") > >>> > >>> and have lvm set up $LVM2_VG_NAME and $LVM2_LV_NAME for you > >>> > >>>> +if [ -z "${vg}" ] || [ -z "${lv}" ]; then > >>>> + echo "${dev}: Not a LVM device." > >>>> + exit 16 > >>>> +fi > >>>> +start_time="$(date +'%Y%m%d%H%M%S')" > >>>> +snap="${lv}.e2scrub" > >>> > >>> Let's hope people are not creating these intentionally. Maybe using a > >>> timestamp might be worthwhile ? > >>> > >>>> +snap_dev="/dev/${vg}/${snap}" > >>>> +fstype="$(blkid -p -s TYPE "${dev}" | sed -e 's/^.*TYPE="\(.*\)".*$/\1/g')" > >>> > >>> blkid -o value -p -s TYPE "${dev}" > >>> > >>>> + > >>>> +case "${fstype}" in > >>>> +"ext2"|"ext3"|"ext4") > >>> > >>> ext[234] > >>> > >>>> + ;; > >>>> +*) > >>>> + echo "${dev}: Filesystem of type ${fstype} not supported." > >>>> + exit 16 > >>>> + ;; > >>>> +esac > >>>> + > >>>> +teardown() { > >>>> + # Remove and wait for removal to succeed. > >>>> + lvremove -f "${vg}/${snap}" 3>&- > >>>> + while [ -b "${snap_dev}" ] && [ "$?" -eq "5" ]; do > >>>> + sleep 0.5 > >>>> + lvremove -f "${vg}/${snap}" 3>&- > >>>> + done > >>>> +} > >>>> + > >>>> +check() { > >>>> + # First we recover the journal, then we see if e2fsck tries any > >>>> + # non-optimization repairs. If either of these two returns a > >>>> + # non-zero status (errors fixed or remaining) then this fs is bad. > >>>> + E2FSCK_FIXES_ONLY=1 > >>>> + export E2FSCK_FIXES_ONLY > >>>> + ${DBG} "@root_sbindir@/e2fsck" -E journal_only -p ${e2fsck_opts} "${snap_dev}" || return 1 > >>>> + ${DBG} "@root_sbindir@/e2fsck" -fy ${e2fsck_opts} "${snap_dev}" || return 1 > >>> > >>> Why to use -y ? Wouldn't it be better to just run with -p ? > >>> > >>>> + return 0 > >>>> +} > >>>> + > >>>> +mark_clean() { > >>>> + ${DBG} "@root_sbindir@/tune2fs" -C 0 -T "${start_time}" "${dev}" > >>> > >>> I am not sure about -C 0. I understand that it's there to elliminate the > >>> mount-count based fsck, but it also means the mount-count is not going > >>> to be reliable, not sure if that matters much. > >> > >> IMHO, with the e2scrub functionality in place it would make sense to > >> reinstate the time- and/or mount-based e2fsck. That would catch the case > >> if e2scrub or the cron/systemd jobs that run it were broken for some > >> reason and hadn't checked the filesystem in months. > > > > If the cron/systemd jobs are set up then it might make sense maybe. But > > again after fs unmount the mount-count might be 0, not sure if that > > matters to people ar all. > > > >> > >>>> +} > >>>> + > >>>> +mark_corrupt() { > >>>> + ${DBG} "@root_sbindir@/tune2fs" -E force_fsck "${dev}" > >>>> +} > >>>> + > >>>> +setup() { > >>>> + # Create the snapshot, wait for device to appear > >>>> + teardown > /dev/null 2> /dev/null > >>>> + lvcreate -s -L "${snap_size_mb}m" -n "${snap}" "${vg}/${lv}" 3>&- > >>>> + test $? -ne 0 && return 1 > >>>> + udevadm settle 2> /dev/null > >>>> + return 0 > >>>> +} > >>>> + > >>>> +trap "teardown; exit 1" EXIT INT QUIT TERM > >>>> +if ! setup; then > >>>> + echo "Snapshot of ${dev} FAILED, will not check!" > >>>> + exit 1 > >>>> +fi > >>>> + > >>>> +# Check and react > >>>> +if check; then > >>>> + echo "Scrub of ${dev} succeeded." > >>>> + mark_clean > >>>> + > >>>> + if [ "${fstrim}" -eq 1 ] && type fstrim > /dev/null 2>&1; then > >>>> + dir="$(lsblk -o MOUNTPOINT -n "${dev}")" > >>>> + if [ -d "${dir}" ]; then > >>>> + # NB: fstrim fails with snapshot present > >>>> + trap '' EXIT > >>>> + teardown > >>>> + fstrim -v "${dir}" > >>> > >>> This can take a long time, how about some information for the user what > >>> we're doing ? > >> > >> This will typically be run via cron/systemd, so you don't want it to be > >> too verbose. > > > > But it does not necessarily have to be run from cron. Then it'll say > > > > Scrub of ${dev} succeeded. > > > > ... then hangs for minutes or potentially even longer, that's not good. > > > > > > Remember, if we get it into e2fsprogs it'll get to distributions and > > people will start using and experimenting with it. People that not > > necessarily know much about lvm or fstrim. It needs to be well documented > > and verbose enough for people to at least have some idea of what's going on. > > > > Especially since I think it might attract some attention, if nothing > > else then just because btrfs has scrub ;) > > > > Regards, > > -Lukas > > > >> > >> Cheers, Andreas > > > Cheers, Andreas > > > > >