On Wed, Sep 23, 2015 at 09:18:31AM -0400, Brian Foster wrote: > On Wed, Sep 23, 2015 at 01:44:06PM +1000, Dave Chinner wrote: > > On Fri, Sep 11, 2015 at 02:55:32PM -0400, Brian Foster wrote: > > > + > > > + pthread_mutex_lock(&libxfs_max_lsn_lock); > > > + > > > + max_cycle = CYCLE_LSN(libxfs_max_lsn); > > > + max_block = BLOCK_LSN(libxfs_max_lsn); > > > + > > > + if ((cycle > max_cycle) || > > > + (cycle == max_cycle && block > max_block)) > > > + libxfs_max_lsn = lsn; Actually, we have XFS_LSN_CMP(lsn1, lsn2) for this. i.e. if (XFS_LSN_CMP(lsn, libxfs_max_lsn) > 0) libxfs_max_lsn = lsn; > > > + > > > + pthread_mutex_unlock(&libxfs_max_lsn_lock); > > > > This will have the same lock contention problems that the kernel > > code would have had - my repair scalablity tests regularly reach > > over 1GB/s of metadata being prefetched through tens of threads, so > > this is going have a significant impact on performance in those > > tests.... > > > > Hmm, Ok. I initially didn't have locking here (by accident) but > reproduced some breakage for which the exact details escape me. I > suspect it was a test and set race. > > I suppose we could try doing something like what we plan to do on the > kernel side in terms of a racy check followed by the locked check so the > lock contention goes away once we find the max lsn. The context here is > different, however, in that we're using a 64-bit LSN rather than > independently updated cycle and block fields. It could also take a while > to stabilize the max lsn depending on the fs. (There are some details on > the kernel side I'd like to get worked out first as well). It still has to work on 32 bit machines, where 64 bit operations are not atomic.... > Perhaps we could try to make this smarter... in the case where either > the log has been zeroed or we've identified an LSN beyond the current > log state, we only really need to track the max cycle value since a log > format is imminent at that point. In the case where the fs is > consistent, we could seed the starting value with the current log lsn, > or track per-ag max LSNs without locking and compare them at at the end, > etc. Yes, we should seen the initial "max lsn" with whatever we find in the log, regardless of whether the fs is consistent or not... Also, per-ag max lsn tracking would break the lock up - that will bring scope down to the point where contention won't be a problem... > I'll have to think about this some more and see what's effective. I'd > also like to quantify the effect the current locking has on performance > if possible. Can you provide a brief description of your typical repair > test that you would expect this to hurt? E.g., a large fs, many AGs, > populated with fs_mark and repaired with many threads..? Any special > storage configuration? Thanks. Just my usual 500TB fs_mark test... $ cat ~/tests/fsmark-50-test-xfs.sh #!/bin/bash QUOTA= MKFSOPTS= NFILES=100000 DEV=/dev/vdc LOGBSIZE=256k while [ $# -gt 0 ]; do case "$1" in -q) QUOTA="uquota,gquota,pquota" ;; -N) NFILES=$2 ; shift ;; -d) DEV=$2 ; shift ;; -l) LOGBSIZE=$2; shift ;; --) shift ; break ;; esac shift done MKFSOPTS="$MKFSOPTS $*" echo QUOTA=$QUOTA echo MKFSOPTS=$MKFSOPTS echo DEV=$DEV sudo umount /mnt/scratch > /dev/null 2>&1 sudo mkfs.xfs -f $MKFSOPTS $DEV sudo mount -o nobarrier,logbsize=$LOGBSIZE,$QUOTA $DEV /mnt/scratch sudo chmod 777 /mnt/scratch cd /home/dave/src/fs_mark-3.3/ sudo sh -c "echo 1 > /proc/sys/fs/xfs/stats_clear" time ./fs_mark -D 10000 -S0 -n $NFILES -s 0 -L 32 \ -d /mnt/scratch/0 -d /mnt/scratch/1 \ -d /mnt/scratch/2 -d /mnt/scratch/3 \ -d /mnt/scratch/4 -d /mnt/scratch/5 \ -d /mnt/scratch/6 -d /mnt/scratch/7 \ -d /mnt/scratch/8 -d /mnt/scratch/9 \ -d /mnt/scratch/10 -d /mnt/scratch/11 \ -d /mnt/scratch/12 -d /mnt/scratch/13 \ -d /mnt/scratch/14 -d /mnt/scratch/15 \ | tee >(stats --trim-outliers | tail -1 1>&2) sync echo Repair sudo umount /mnt/scratch time sudo xfs_repair -o bhash=100101 -v -v -t 1 $DEV time sudo mount -o nobarrier,logbsize=$LOGBSIZE,$QUOTA $DEV /mnt/scratch echo bulkstat files time ( sudo ~/src/xfstests-dev/src/bstat -q /mnt/scratch 1024 | wc -l ) echo walking files ~/tests/walk-scratch.sh echo removing files for f in /mnt/scratch/* ; do time rm -rf $f & done wait sudo umount /mnt/scratch $ -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs