On Tue, Jun 10, 2014 at 11:21:49AM +1000, Dave Chinner wrote: > On Fri, Jun 06, 2014 at 09:14:43AM -0400, Brian Foster wrote: > > Changes in the XFS logging code have lead to small leaks in the log > > grant heads that consume log space slowly over time. Such problems have > > gone undetected for an unnecessarily long time due to code complexity > > and potential for very subtle problems. Losing only a few bytes per > > logged item on a reasonably large enough fs (10s of GB) means only the > > most continuously stressful workloads will cause a severe enough failure > > (deadlock due to log reservation exhaustion) quickly enough to indicate > > something is seriously wrong. > > > > Recent changes in XFS export the state of the various log heads through > > sysfs to aid in userspace/runtime analysis of the log. This test runs a > > workload against an XFS filesystem, quiesces the fs and verifies that > > the log reserve and write grant heads have not leaked any space with > > respect to the current head of the physical log. > > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > > --- > .... > > > > +# Determine the system device name for a particular block device. The device > > +# name is how the block dev is referenced under sysfs. > > +_get_device_name() > > +{ > > + devpath=$1 > > + > > + # check for a symlink (i.e., device mapper) > > + if [ -L $devpath ] > > + then > > + devpath=`readlink -f $devpath` > > + fi > > + > > + # grab the major minor and convert from hex to decimal > > + major=$((0x`stat -c %t $devpath`)) > > + minor=$((0x`stat -c %T $devpath`)) > > + > > + # refer to sysfs by major minor > > + basename `readlink /sys/dev/block/$major:$minor` > > +} > > $ basename `readlink -f /dev/mapper/vg0-home` > dm-2 > $ basename `readlink /sys/dev/block/253:2` > dm-2 > > Why is _short_dev() not sufficient? > That should work... I guess I missed it. > > +# Use the information exported by XFS to sysfs to determine whether the log has > > +# active reservations after a filesystem freeze. > > +_check_scratch_log_state() > > +{ > > + devname=`_get_device_name $SCRATCH_DEV` > > + attrpath="/sys/fs/xfs/$devname/log" > > + > > + # freeze the fs to ensure data is synced and the log is flushed. this > > + # means no outstanding transactions, and thus no outstanding log > > + # reservations, should exist > > + xfs_freeze -f $SCRATCH_MNT > > + > > + # the log head is exported in basic blocks and the log grant heads in > > + # bytes. convert the log head to bytes for precise comparison > > + log_head_cycle=`cat $attrpath/log_head_lsn | awk -F : '{ print $1 }'` > > + log_head_bytes=`cat $attrpath/log_head_lsn | awk -F : '{ print $2 }'` > > awk can read files directly: > > log_head_cycle=`awk -F : '{ print $1 }' $attrpath/log_head_lsn` > > > + log_head_bytes=$((log_head_bytes * 512)) > > + > > + for attr in "reserve_grant_head" "write_grant_head" > > + do > > + cycle=`cat $attrpath/$attr | awk -F : '{ print $1 }'` > > + bytes=`cat $attrpath/$attr | awk -F : '{ print $2 }'` > > + > > + if [ $cycle != $log_head_cycle ] || > > + [ $bytes != $log_head_bytes ] > > + then > > + echo "$attr ($cycle:$bytes) does not match" \ > > + "log_head_lsn ($log_head_cycle:$log_head_bytes)," \ > > + "possible leak detected." > > + else > > + echo "$attr matches log_head_lsn" > > + fi > > + done > > + > > + xfs_freeze -u $SCRATCH_MNT > > +} > > + > > +# real QA test starts here > > +_supported_fs xfs > > +_supported_os Linux > > + > > +_require_scratch > > +_require_freeze > > + > > +if [ ! -e /sys/fs/xfs ] > > +then > > + _notrun "no kernel support for XFS sysfs attributes" > > +fi > > _requires_xfs_sysfs > I was mulling this over as I think we'll probably end up in a situation where a test that depends on sysfs bits will need to check for a specific attribute file. E.g., some new test comes along using a new attribute file. Checking for /sys/fs/xfs is not sufficient for that test once we release a version that so far only exports the log bits. I think we could handle that by supporting a parameter to _requires_xfs_sysfs that specifies the sub-attribute that must exist (similar to what we have for xfs_io commands). We don't need that at the moment, but that's good enough for me to create the requires func. > > + > > +rm -f $seqres.full > > + > > +_scratch_mkfs_xfs | _filter_mkfs 2>> $seqres.full > > +_scratch_mount > > + > > +_check_scratch_log_state > > + > > +$FSSTRESS_PROG -d $SCRATCH_MNT/fsstress -n 1000 -p 2 -S t \ > > + >> $seqres.full 2>&1 > > + > > +_check_scratch_log_state > > wouldn't it be better to run fsstress as a background process and do > several freeze/check/thaw cycles on a running workload? > Good idea, that might be a more effective test for catching future issues. As it is, this test runs in a few seconds so I think we could beef it up a little without losing the 'quick' categorization. I'll play around with it. > > + > > +umount $SCRATCH_MNT > > +_check_scratch_fs > > + > > +status=0 > > +exit > > diff --git a/tests/xfs/011.out b/tests/xfs/011.out > > new file mode 100644 > > index 0000000..a3f3805 > > --- /dev/null > > +++ b/tests/xfs/011.out > > @@ -0,0 +1,11 @@ > > +QA output created by 011 > > +meta-data=DDEV isize=XXX agcount=N, agsize=XXX blks > > +data = bsize=XXX blocks=XXX, imaxpct=PCT > > + = sunit=XXX swidth=XXX, unwritten=X > > +naming =VERN bsize=XXX > > +log =LDEV bsize=XXX blocks=XXX > > +realtime =RDEV extsz=XXX blocks=XXX, rtextents=XXX > > Any particular reason for dumping the filtered mkfs information > here? It won't ever cause a test failure unless we break > _filter_mkfs... > Not really, I'll clean that up. > > +reserve_grant_head matches log_head_lsn > > +write_grant_head matches log_head_lsn > > +reserve_grant_head matches log_head_lsn > > +write_grant_head matches log_head_lsn > > diff --git a/tests/xfs/group b/tests/xfs/group > > index 19fd968..99bf0e1 100644 > > --- a/tests/xfs/group > > +++ b/tests/xfs/group > > @@ -8,6 +8,7 @@ > > 008 rw ioctl auto quick > > 009 rw ioctl auto prealloc quick > > 010 auto quick repair > > +011 auto quick freeze > > log and metadata, too. > Ok. Thanks for the review. Brian > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs