On Wed, Sep 23, 2015 at 01:44:06PM +1000, Dave Chinner wrote: > On Fri, Sep 11, 2015 at 02:55:32PM -0400, Brian Foster wrote: > > The LSN validation helper is called in the I/O verifier codepath for > > metadata that embed a last-modification LSN. While the codepath exists, > > this is not used in userspace as in the kernel because the former > > doesn't have an active log. > > > > xfs_repair does need to check the validity of the LSN metadata with > > respect to the on-disk log, however. Use the LSN validation mechanism to > > track the largest LSN that has been seen. Export the value so repair can > > use it once it has processed the entire filesystem. Note that the helper > > continues to always return true to preserve existing behavior. > > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > .... > > +xfs_lsn_t libxfs_max_lsn = 0; > > +pthread_mutex_t libxfs_max_lsn_lock = PTHREAD_MUTEX_INITIALIZER; > > + > > void > > xfs_detect_invalid_lsn( > > struct xfs_mount *mp, > > xfs_lsn_t lsn) > > { > > + int cycle = CYCLE_LSN(lsn); > > + int block = BLOCK_LSN(lsn); > > + int max_cycle; > > + int max_block; > > + > > + if (lsn == NULLCOMMITLSN) > > + return; > > + > > + pthread_mutex_lock(&libxfs_max_lsn_lock); > > + > > + max_cycle = CYCLE_LSN(libxfs_max_lsn); > > + max_block = BLOCK_LSN(libxfs_max_lsn); > > + > > + if ((cycle > max_cycle) || > > + (cycle == max_cycle && block > max_block)) > > + libxfs_max_lsn = lsn; > > + > > + pthread_mutex_unlock(&libxfs_max_lsn_lock); > > This will have the same lock contention problems that the kernel > code would have had - my repair scalablity tests regularly reach > over 1GB/s of metadata being prefetched through tens of threads, so > this is going have a significant impact on performance in those > tests.... > Hmm, Ok. I initially didn't have locking here (by accident) but reproduced some breakage for which the exact details escape me. I suspect it was a test and set race. I suppose we could try doing something like what we plan to do on the kernel side in terms of a racy check followed by the locked check so the lock contention goes away once we find the max lsn. The context here is different, however, in that we're using a 64-bit LSN rather than independently updated cycle and block fields. It could also take a while to stabilize the max lsn depending on the fs. (There are some details on the kernel side I'd like to get worked out first as well). Perhaps we could try to make this smarter... in the case where either the log has been zeroed or we've identified an LSN beyond the current log state, we only really need to track the max cycle value since a log format is imminent at that point. In the case where the fs is consistent, we could seed the starting value with the current log lsn, or track per-ag max LSNs without locking and compare them at at the end, etc. I'll have to think about this some more and see what's effective. I'd also like to quantify the effect the current locking has on performance if possible. Can you provide a brief description of your typical repair test that you would expect this to hurt? E.g., a large fs, many AGs, populated with fs_mark and repaired with many threads..? Any special storage configuration? Thanks. Brian > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs