On Fri, Jun 14, 2013 at 02:44:53PM -0500, Ben Myers wrote: > Unfortunately log replay is broken. The verifier has detected this and stopped > replay. Ideally the solution would be to fix log replay, but that is going to > take some time. So, in the near term we're just going to disable the verifier > to allow replay to complete. > > I'm suggesting that this disabling be done conditionally on CONFIG_XFS_DEBUG so > that developers still have a chance at hitting the log replay problem, and a > comment should be added explaining that we've disabled the verifier due to a > specific bug as a temporary workaround and we'll re-enable the verifier once > it's fixed. I'll update the patch and repost. > > Are you guys arguing that the log replay bug should not be fixed? I'm not arguing that it should not be fixed, I'm *stating* that it *can't be fixed* for non-CRC filesystems. That is, the best we can do is work around this deficiency in log recovery with some kind of self-defusing warning.... To fix it properly, you need to know the age of the object being overwritten relative to the age of overwrite data. For non-CRC filesystems we don't have that information in the metadata object being overwritten. We can't even correctly identify the object being overwritten. So, it's simply not fixable in log recovery for non-CRC filesystems, and the LSN stamped in every piece of metadata at writeback time for CRC enabled filesystems is designed precisely to avoid this problem. Indeed, the LSN stamp is a far more effective method than what used to be in the inode core to try to avoid the problem for unlogged inode size updates to try to prevent log recovery from replaying inode core updates over more recently written inodes - the di_flushiter field. Note the comment in xfs_flush_int(): /* * bump the flush iteration count, used to detect flushes which * postdate a log record during recovery. This is redundant as we now * log every change and hence this can't happen. Still, it doesn't hurt. */ ip->i_d.di_flushiter++; And this in xlog_recover_inode_pass2(): /* Skip replay when the on disk inode is newer than the log one */ if (dicp->di_flushiter < be16_to_cpu(dip->di_flushiter)) { This recovery problem has been around forever, and it we cannot fix log recovery with age information in the metadata on disk that recovery is going overwrite. CRC enabled filesystems have that information on disk, existing filesystems don't. Therefore, we can only solve the recovery problem for CRC enabled filesystems... We could probably also avoid the problem by modifying the way we do writeback from the AIL to limit it only to objects at the tail LSN, but that has a horrific performance penalty associated with it for many common workloads because of the way relogging works. And for a problem that I've suspected has occurred maybe 5 times in 10 years, modifying metadata writeback to avoid this problem is a bad tradeoff to make for just about everyone... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs