On Jan 28, 2008 19:56 -0500, Bryan Kadzban wrote: > >> # Assume the script won't run more than one instance at a time? > >> lvremove -f "${lvtemp##/dev}" > > > > Should check the error return and bail out of script if there is an error. > > Will that catch the "more than one instance at a time" case (e.g. if > another script run is still running e2fsck on this snapshot)? Assuming > lvremove can fail (and it probably can), it's probably a good idea to > check it in any case, but if running e2fsck makes lvremove fail (until > e2fsck finishes), that's a decent way to get rid of the comment too. > > Also, I think it'd be better to skip just the current FS, rather than an > "exit 1" type bail-out, right? It's a hard call... In some sense if there is an error we may leave a string of LVs around that are filling up the VG, but the presence of the LV (and hopefully being unable to remove it while e2fsck is running) also serves as a "locking" mechanism in case some e2fsck takes a very long time to run. I guess as long as we print something in the syslog, and the LV remains in place with a suitably clear "this isn't very useful" name, then eventually the user will notice it and delete it. > - ----- > > Create a script to transparently run fsck in the background on any > active LVM logical volumes, as long as the machine is on AC power, and > that LV has been last checked more than a configurable number of days > ago. Also create an optional configuration file to set various options > in the script. > > Signed-Off-By: Bryan Kadzban <bryan@xxxxxxxxxxxxxxxxxxxxx> You can add a Signed-Off-By: Andreas Dilger <adilger@xxxxxxx> here, as it does everything I think is needed at this point... Probably good to put a version number in the script, along with your name/email so it is clear what version a user is running. > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.7 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHnnnRS5vET1Wea5wRAw0iAJ9wcLyfBSaH5FSIJNH0YakzDCUvjwCgnJEH > lPScP39vBYIIjOQPiftgDs8= > =XjFF > -----END PGP SIGNATURE----- > #!/bin/sh > # > # lvcheck > > # Released under the GNU General Public License, either version 2 or > # (at your option) any later version. > > # Overview: > # > # Run this from cron periodically (e.g. once per week). If the > # machine is on AC power, it will run the checks; otherwise they will > # all be skipped. (If the script can't tell whether the machine is > # on AC power, it will use a setting in the configuration file > # (/etc/lvcheck.conf) to decide whether to continue with the checks, > # or abort.) > # > # The script will then decide which logical volumes are active, and > # can therefore be checked via an LVM snapshot. Each of these LVs > # will be queried to find its last-check day, and if that was more > # than $INTERVAL days ago (where INTERVAL is set in the configuration > # file as well), or if the last-check day can't be determined, then > # the script will take an LVM snapshot of that LV and run fsck on the > # snapshot. The snapshot will be set to use 1/500 the space of the > # source LV. After fsck finishes, the snapshot is destroyed. > # (Snapshots are checked serially.) > # > # Any LV that passes fsck should have its last-check time updated (in > # the real superblock, not the snapshot's superblock); any LV whose > # fsck fails will send an email notification to a configurable user > # ($EMAIL). This $EMAIL setting is optional, but its use is highly > # recommended, since if any LV fails, it will need to be checked > # manually, offline. Relevant messages are also sent to syslog. > > # Set default values for configuration params. Changes to these values > # will be overwritten on an upgrade! To change these values, use > # /etc/lvcheck.conf. > EMAIL='root' > INTERVAL=30 > AC_UNKNOWN="CONTINUE" > MINSNAP=256 > MINFREE=0 > > # send $2 to syslog, with severity $1 > # severities are emerg/alert/crit/err/warning/notice/info/debug > function log() { > local sev="$1" > local msg="$2" > local arg= > > # log warning-or-higher messages to stderr as well > [ "$sev" == "emerg" || "$sev" == "alert" || "$sev" == "crit" || \ > "$sev" == "err" || "$sev" == "warning" ] && arg=-s > > logger -t lvcheck $arg -p user."$sev" -- "$msg" > } > > # determine whether the machine is on AC power > function on_ac_power() { > local any_known=no > > # try sysfs power class first > if [ -d /sys/class/power_supply ] ; then > for psu in /sys/class/power_supply/* ; do > if [ -r "${psu}/type" ] ; then > type="`cat "${psu}/type"`" > > # ignore batteries > [ "${type}" = "Battery" ] && continue > > online="`cat "${psu}/online"`" > > [ "${online}" = 1 ] && return 0 > [ "${online}" = 0 ] && any_known=yes > fi > done > > [ "${any_known}" = "yes" ] && return 1 > fi > > # else fall back to AC adapters in /proc > if [ -d /proc/acpi/ac_adapter ] ; then > for ac in /proc/acpi/ac_adapter/* ; do > if [ -r "${ac}/state" ] ; then > grep -q on-line "${ac}/state" && return 0 > grep -q off-line "${ac}/state" && any_known=yes > elif [ -r "${ac}/status" ] ; then > grep -q on-line "${ac}/status" && return 0 > grep -q off-line "${ac}/status" && any_known=yes > fi > done > > [ "${any_known}" = "yes" ] && return 1 > fi > > if [ "$AC_UNKNOWN" == "CONTINUE" ] ; then > return 0 # assume on AC power > elif [ "$AC_UNKNOWN" == "ABORT" ] ; then > return 1 # assume on battery > else > log "err" "Invalid value for AC_UNKNOWN in the config file" > exit 1 > fi > } > > # attempt to force a check of $1 on the next reboot > function try_force_check() { > local dev="$1" > local fstype="$2" > > case "$fstype" in > ext2|ext3) > tune2fs -C 16000 "$dev" > ;; > *) > log "warning" "Don't know how to force a check on $fstype..." > ;; > esac > } > > # attempt to set the last-check time on $1 to now, and the mount count to 0. > function try_delay_checks() { > local dev="$1" > local fstype="$2" > > case "$fstype" in > ext2|ext3) > tune2fs -C 0 -T now "$dev" > ;; > *) > log "warning" "Don't know how to delay checks on $fstype..." > ;; > esac > } > > # print the date that $1 was last checked, in a format that date(1) will > # accept, or "Unknown" if we don't know how to find that date. > function try_get_check_date() { > local dev="$1" > local fstype="$2" > > case "$fstype" in > ext2|ext3) > dumpe2fs -h "$dev" 2>/dev/null | grep 'Last checked:' | \ > sed -e 's/Last checked:[[:space:]]*//' > ;; > *) > # TODO: add support for various FSes here > echo "Unknown" > ;; > esac > } > > # check the FS on $1 passively, saving output to $3. > function perform_check() { > local dev="$1" > local fstype="$2" > local tmpfile="$3" > > case "$fstype" in > ext2|ext3) > nice logsave -as "${tmpfile}" e2fsck -fn "$dev" > return $? > ;; > reiserfs) > echo Yes | nice logsave -as "${tmpfile}" fsck.reiserfs --check "$dev" > # apparently can't fail? let's hope not... > return 0 > ;; > xfs) > nice logsave -as "${tmpfile}" xfs_check "$dev" > return $? > ;; > jfs) > nice logsave -as "${tmpfile}" fsck.jfs -fn "$dev" > return $? > ;; > *) > log "warning" "Don't know how to check $fstype filesystems passively: assuming OK." > ;; > esac > } > > # do everything needed to check and reset dates and counters on /dev/$1/$2. > function check_fs() { > local vg="$1" > local lv="$2" > local fstype="$3" > local snapsize="$4" > > local tmpfile=`mktemp -t lvcheck.log.XXXXXXXXXX` > local errlog="/var/log/lvcheck-${vg}@${lv}-`date +'%Y%m%d'`" > local snaplvbase="${lv}-lvcheck-temp" > local snaplv="${snaplvbase}-`date +'%Y%m%d'`" > > # clean up any left-over snapshot LVs > for lvtemp in /dev/${vg}/${snaplvbase}* ; do > if [ -e "$lvtemp" ] ; then > # Assume the script won't run more than one instance at a time? > > log "warning" "Found stale snapshot $lvtemp: attempting to remove." > > if ! lvremove -f "${lvtemp##/dev}" ; then > log "error" "Could not delete stale snapshot $lvtemp" > return 1 > fi > fi > done > > # and create this one > lvcreate -s -l "$snapsize" -n "${snaplv}" "${vg}/${lv}" > > if perform_check "/dev/${vg}/${snaplv}" "${fstype}" "${tmpfile}" ; then > log "info" "Background scrubbing of /dev/${vg}/${lv} succeeded." > try_delay_checks "/dev/${vg}/${lv}" "$fstype" > else > log "err" "Background scrubbing of /dev/${vg}/${lv} failed: run fsck offline soon!" > try_force_check "/dev/${vg}/${lv}" "$fstype" > > if test -n "$EMAIL"; then > mail -s "Fsck of /dev/${vg}/${lv} failed!" $EMAIL < $tmpfile > fi > > # save the log file in /var/log in case mail is disabled > mv "$tmpfile" "$errlog" > fi > > rm -f "$tmpfile" > lvremove -f "${vg}/${snaplv}" > } > > # pull in configuration -- overwrite the defaults above if the file exists > [ -r /etc/lvcheck.conf ] && . /etc/lvcheck.conf > > # check whether the machine is on AC power: if not, skip fsck > on_ac_power || exit 0 > > # parse up lvscan output > lvscan 2>&1 | grep ACTIVE | awk '{print $2;}' | \ > while read DEV ; do > # remove the single quotes around the device name > DEV="`echo "$DEV" | tr -d \'`" > > # get the FS type: blkid prints TYPE="blah" > eval `blkid -s TYPE "$DEV" | cut -d' ' -f2` > > # get the last-check time > check_date=`try_get_check_date "$DEV" "$TYPE"` > > # if the date is unknown, run fsck every time the script runs. sigh. > if [ "$check_date" != "Unknown" ] ; then > # add $INTERVAL days, and throw away the time portion > check_day=`date --date="$check_date $INTERVAL days" +'%Y%m%d'` > > # get today's date, and skip the check if it's not within the interval > today=`date +'%Y%m%d'` > [ $check_day -gt $today ] && continue > fi > > # get the volume group and logical volume names > VG="`lvs --noheadings -o vg_name "$DEV"`" > LV="`lvs --noheadings -o lv_name "$DEV"`" > > # get the free space and LV size (in megs), guess at the snapshot > # size, and see how much the admin will let us use (keeping MINFREE > # available) > SPACE="`lvs --noheadings --units M --nosuffix -o vg_free "$DEV"`" > SIZE="`lvs --noheadings --units M --nosuffix -o lv_size "$DEV"`" > SNAPSIZE="`expr "$SIZE" / 500`" > AVAIL="`expr "$SPACE" - "$MINFREE"`" > > # if we don't even have MINSNAP space available, skip the LV > if [ "$MINSNAP" -gt "$AVAIL" -o "$AVAIL" -le 0 ] ; then > log "warning" "Not enough free space on volume group for ${DEV}; skipping" > continue > fi > > # make snapshot large enough to handle e.g. journal and other updates > [ "$SNAPSIZE" -lt "$MINSNAP" ] && SNAPSIZE="$MINSNAP" > > # limit snapshot to available space (VG space minus min-free) > [ "$SNAPSIZE" -gt "$AVAIL" ] && SNAPSIZE="$AVAIL" > > # don't need to check SNAPSIZE again: MINSNAP <= AVAIL, MINSNAP <= SNAPSIZE, > # and SNAPSIZE <= AVAIL, combined, means SNAPSIZE must be between MINSNAP > # and AVAIL, which is what we need -- assuming AVAIL > 0 > > # check it > check_fs "$VG" "$LV" "$TYPE" "$SNAPSIZE" > done > > #!/bin/sh > > # e2check configuration file Minor note - "lvscan configuration file". > # This file follows the pattern of sshd_config: default > # values are shown here, commented-out. > > # EMAIL > # Address to send failure notifications to. If empty, > # failure notifications will not be sent. > > #EMAIL='root' > > # INTERVAL > # Days to wait between checks. All LVs use the same > # INTERVAL, but the "days since last check" value can > # be different per LV, since that value is stored in > # the filesystem superblock. > > #INTERVAL=30 > > # AC_UNKNOWN > # Whether to run the e2fsck checks if the script can't > # determine whether the machine is on AC power. Laptop > # users will want to set this to ABORT, while server and > # desktop users will probably want to set this to > # CONTINUE. Those are the only two valid values. > > #AC_UNKNOWN="CONTINUE" > > # MINSNAP > # Minimum snapshot size to take, in megabytes. The > # default snapshot size is 1/500 the size of the logical > # volume, but if that size is less than MINSNAP, the > # script will use MINSNAP instead. This should be large > # enough to handle e.g. journal updates, and other disk > # changes that require (semi-)constant space. > > #MINSNAP=256 > > # MINFREE > # Minimum amount of space (in megabytes) to keep free in > # each volume group when creating snapshots. > > #MINFREE=0 > Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Ext3-users mailing list Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users