On Fri, Jun 06, 2014 at 09:14:43AM -0400, Brian Foster wrote: > Changes in the XFS logging code have lead to small leaks in the log > grant heads that consume log space slowly over time. Such problems have > gone undetected for an unnecessarily long time due to code complexity > and potential for very subtle problems. Losing only a few bytes per > logged item on a reasonably large enough fs (10s of GB) means only the > most continuously stressful workloads will cause a severe enough failure > (deadlock due to log reservation exhaustion) quickly enough to indicate > something is seriously wrong. > > Recent changes in XFS export the state of the various log heads through > sysfs to aid in userspace/runtime analysis of the log. This test runs a > workload against an XFS filesystem, quiesces the fs and verifies that > the log reserve and write grant heads have not leaked any space with > respect to the current head of the physical log. > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- .... > > +# Determine the system device name for a particular block device. The device > +# name is how the block dev is referenced under sysfs. > +_get_device_name() > +{ > + devpath=$1 > + > + # check for a symlink (i.e., device mapper) > + if [ -L $devpath ] > + then > + devpath=`readlink -f $devpath` > + fi > + > + # grab the major minor and convert from hex to decimal > + major=$((0x`stat -c %t $devpath`)) > + minor=$((0x`stat -c %T $devpath`)) > + > + # refer to sysfs by major minor > + basename `readlink /sys/dev/block/$major:$minor` > +} $ basename `readlink -f /dev/mapper/vg0-home` dm-2 $ basename `readlink /sys/dev/block/253:2` dm-2 Why is _short_dev() not sufficient? > +# Use the information exported by XFS to sysfs to determine whether the log has > +# active reservations after a filesystem freeze. > +_check_scratch_log_state() > +{ > + devname=`_get_device_name $SCRATCH_DEV` > + attrpath="/sys/fs/xfs/$devname/log" > + > + # freeze the fs to ensure data is synced and the log is flushed. this > + # means no outstanding transactions, and thus no outstanding log > + # reservations, should exist > + xfs_freeze -f $SCRATCH_MNT > + > + # the log head is exported in basic blocks and the log grant heads in > + # bytes. convert the log head to bytes for precise comparison > + log_head_cycle=`cat $attrpath/log_head_lsn | awk -F : '{ print $1 }'` > + log_head_bytes=`cat $attrpath/log_head_lsn | awk -F : '{ print $2 }'` awk can read files directly: log_head_cycle=`awk -F : '{ print $1 }' $attrpath/log_head_lsn` > + log_head_bytes=$((log_head_bytes * 512)) > + > + for attr in "reserve_grant_head" "write_grant_head" > + do > + cycle=`cat $attrpath/$attr | awk -F : '{ print $1 }'` > + bytes=`cat $attrpath/$attr | awk -F : '{ print $2 }'` > + > + if [ $cycle != $log_head_cycle ] || > + [ $bytes != $log_head_bytes ] > + then > + echo "$attr ($cycle:$bytes) does not match" \ > + "log_head_lsn ($log_head_cycle:$log_head_bytes)," \ > + "possible leak detected." > + else > + echo "$attr matches log_head_lsn" > + fi > + done > + > + xfs_freeze -u $SCRATCH_MNT > +} > + > +# real QA test starts here > +_supported_fs xfs > +_supported_os Linux > + > +_require_scratch > +_require_freeze > + > +if [ ! -e /sys/fs/xfs ] > +then > + _notrun "no kernel support for XFS sysfs attributes" > +fi _requires_xfs_sysfs > + > +rm -f $seqres.full > + > +_scratch_mkfs_xfs | _filter_mkfs 2>> $seqres.full > +_scratch_mount > + > +_check_scratch_log_state > + > +$FSSTRESS_PROG -d $SCRATCH_MNT/fsstress -n 1000 -p 2 -S t \ > + >> $seqres.full 2>&1 > + > +_check_scratch_log_state wouldn't it be better to run fsstress as a background process and do several freeze/check/thaw cycles on a running workload? > + > +umount $SCRATCH_MNT > +_check_scratch_fs > + > +status=0 > +exit > diff --git a/tests/xfs/011.out b/tests/xfs/011.out > new file mode 100644 > index 0000000..a3f3805 > --- /dev/null > +++ b/tests/xfs/011.out > @@ -0,0 +1,11 @@ > +QA output created by 011 > +meta-data=DDEV isize=XXX agcount=N, agsize=XXX blks > +data = bsize=XXX blocks=XXX, imaxpct=PCT > + = sunit=XXX swidth=XXX, unwritten=X > +naming =VERN bsize=XXX > +log =LDEV bsize=XXX blocks=XXX > +realtime =RDEV extsz=XXX blocks=XXX, rtextents=XXX Any particular reason for dumping the filtered mkfs information here? It won't ever cause a test failure unless we break _filter_mkfs... > +reserve_grant_head matches log_head_lsn > +write_grant_head matches log_head_lsn > +reserve_grant_head matches log_head_lsn > +write_grant_head matches log_head_lsn > diff --git a/tests/xfs/group b/tests/xfs/group > index 19fd968..99bf0e1 100644 > --- a/tests/xfs/group > +++ b/tests/xfs/group > @@ -8,6 +8,7 @@ > 008 rw ioctl auto quick > 009 rw ioctl auto prealloc quick > 010 auto quick repair > +011 auto quick freeze log and metadata, too. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs