Re: [PATCH 2/4] e2scrub: create online fsck tool of sorts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 14, 2018 at 10:03:15PM -0600, Andreas Dilger wrote:
> On Mar 14, 2018, at 12:17 AM, Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote:
> > 
> > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > 
> > Implement online fsck for ext* filesystems which live on LVM-managed
> > logical volumes.  The basic strategy mirrors that of e2croncheck --
> > create a snapshot, fsck the snapshot, report whatever errors appear,
> > remove snapshot.  Unlike e2croncheck, this utility accepts any LVM
> > device path, knows about snapshots running out of space, and can call
> > fstrim having validated that the fs metadata is ok.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > 
> > diff --git a/scrub/e2scrub.in b/scrub/e2scrub.in
> > new file mode 100644
> > index 0000000..647f0e6
> > --- /dev/null
> > +++ b/scrub/e2scrub.in
> > @@ -0,0 +1,207 @@
> > +#!/bin/bash
> > +
> > +#  Copyright (C) 2018 Oracle.  All Rights Reserved.
> > +#
> > +#  Author: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > +#
> > +#  This program is free software; you can redistribute it and/or
> > +#  modify it under the terms of the GNU General Public License
> > +#  as published by the Free Software Foundation; either version 2
> > +#  of the License, or (at your option) any later version.
> > +#
> > +#  This program is distributed in the hope that it would be useful,
> > +#  but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +#  GNU General Public License for more details.
> > +#
> > +#  You should have received a copy of the GNU General Public License
> > +#  along with this program; if not, write the Free Software Foundation,
> > +#  Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> 
> I think it is preferred to visit http://www.gnu.org/licenses/gpl-2.0.html
> since this snail mail address has changed in the past, and it is unlikely
> that anyone would use it in any case.
> 
> > +# Automatically check a LVM-managed filesystem online.
> > +# We use lvm snapshots to do this, which means that we can only
> > +# check filesystems in VGs that have at least 256mb (or so) of
> 
> s/mb/MB/

Ok.

> > +# Make sure this is an LVM device we can snapshot
> > +lvm_vars="$(lvs --nameprefixes -o name,vgname,lv_role --noheadings "${dev}" 2> /dev/null)"
> > +eval "${lvm_vars}"
> > +if [ -z "${LVM2_VG_NAME}" ] || [ -z "${LVM2_LV_NAME}" ] ||
> > +   echo "${LVM2_LV_ROLE}" | grep -q "snapshot"; then
> > +	echo "${dev}: Not a LVM logical volume."
> > +	print_help
> > +	exit 16
> > +fi
> > +start_time="$(date +'%Y%m%d%H%M%S')"
> > +snap="${LVM2_LV_NAME}.e2scrub"
> > +snap_dev="/dev/${LVM2_VG_NAME}/${snap}"
> > +
> > +teardown() {
> > +	# Remove and wait for removal to succeed.
> > +	${DBG} lvremove -f "${LVM2_VG_NAME}/${snap}" 3>&-
> 
> It isn't clear to me what fd 3 is for in these commands?

For whatever reason, lvm tools complain about leaked file descriptors if
fd 3 is open, and systemd and cron will sometimes feed it such a thing.

> > +	while [ -e "${snap_dev}" ] && [ "$?" -eq "5" ]; do
> > +		sleep 0.5
> > +		${DBG} lvremove -f "${LVM2_VG_NAME}/${snap}" 3>&-
> > +	done
> 
> This while loop could be slightly restructured to avoid multiple lvremove
> commands, like:
> 
> teardown() {
> 	# Remove and wait for removal to succeed.
> 	while [ -e "${snap_dev}" ] &&
> 	      [ `${DBG} lvremove -f "${LVM2_VG_NAME}/${snap}" 3>&-` -eq "5" ]; do

But that's not equivalent.  The patch runs lvremove and compares the
return value to 5, whereas this captures the stdout of lvremove and
compares the stdout data to 5.

> 		sleep 0.5
> 	done
> }
> 
> That said, should this fail after some number of retries?  What if there
> is another e2scrub running on this device keeping it busy?  Should that
> be checked separately?

There's a small window in which concurrent e2scrubs can interfere with each
other (one creates the snapshot and goes to e2fsck while the other one
tears it down).  It's a pity there isn't a way to tell lvm to create a
O_TMPFILE like snapshot, feed it to e2fsck, and have it automatically
disappear when the fd closes.

TBH I was intending this to run as an automatic background systemd
service, which provides the necessary isolation without having to go
figure out pid files for non-systemd systems.  I guess we can talk
tomorrow about this assuming there's an ext4 call...

--D

> Cheers, Andreas
> 
> 
> 
> 
> 





[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux