On Tue, Mar 07, 2017 at 08:47:55AM +1100, Dave Chinner wrote: > On Mon, Mar 06, 2017 at 01:18:20PM -0500, Brian Foster wrote: > > On Mon, Mar 06, 2017 at 09:22:05AM -0800, Darrick J. Wong wrote: > > > On Mon, Mar 06, 2017 at 08:51:44AM -0500, Brian Foster wrote: > > > > On Fri, Mar 03, 2017 at 04:05:29PM -0800, Darrick J. Wong wrote: > > > > > In commit ef388e2054 ("don't allow di_size with high bit set") and > > > > > commit 3c6f46eacd87 ("sanity check directory inode di_size") we > > > > > erroneously fail the inode verification checks for symlinks or > > > > > directories with di_size == 0. > > > > > > > > > > This is proven incorrect by generic/388 because the unlink process uses > > > > > multiple unconnected transactions to truncate the remote symlink / > > > > > directory and to unlink and free the inode. If the fs goes down after > > > > > the truncate commit but before the unlink happens, we are left with a > > > > > zero-length inode. Worse yet, if the unlink /does/ commit, log recovery > > > > > will now break after the first transaction because we've zeroed di_size. Brian had a question in an earlier reply about whether or not g/388 actually produced a filesystem with a zero-length symlink. Looking back at the metadumps I kept around from Friday I don't see any indication of that happening. However, the atomicity problem still exists. > > > > > > Ugh, I misspoke slightly. Truncate and *ifree* are separate > > > transactions that can lose each other in the mist after a crash. > > > > > > So that separates this discussion into two classes of problem -- the > > > first is the case of "truncate, crash, forget to ifree", which at least > > > isn't a user-visible problem because the inode has been unlinked from > > > the directory tree at that point. Log recovery breaks if iget -> > > > dinode_verify stumbles over the zero-length inode. > > > > > > > Ok, that change seems reasonable. > > Not to me, though - it's indicative of an atomicity problem, and not > something we should work around by considering the intermediate > "corrupt" state on disk valid. Remember that one of the main reasons > for checks like these in the verifiers is that they tell us where > our transactional change atomicity is broken. Gutting the verifier > because it's indicated we have a change atomicity problem is not the > solution. > > i.e. I think we should be looking at changing xfs_inactive() to use > the deferred ops state machine for the multiple {truncate-data, > truncata-attr, free inode} transactions. Then we don't need to care > about temporary incomplete state on disk - if we get the first > transaction completed, log recovery will have an intent one disk for > the next transaction, and we can complete the inode freeing > process rather than leaving a dangling chad. I argued for this earlier as a longer-term fix, but vger has not been transmitting emails reliably for a week now, so I don't know if you saw that part. There are various parts of XFS that have atomicity problems just like this one. Auditing the codebase to find all these places and add the necessary log functionality could take a while, though it ought to be done now that we have the ability to chain transactions. (Also I think there's a minor bug with the defer_ops where if we call defer_trans_roll, the transaction commits successfully, but then something else fails, we end up double-freeing the intent items.) > Making an exception in a verifier for a special log recovery state > is fine - we have to handle all sorts of interesting intermediate > states in log recovery - but as a general rule if it's an invalid > state the verifier should always be catching it.... Ah, see, there's something I didn't know -- that verifiers can be contextually sensitive. In this case we could probably get away with a new iget flag so that log replay could say "Hey, I don't care if the size is zero". > [snip] > > > Essentially what I'm saying is that if it is not possible to have a > > zero-sized symlink in a directory except for a bug or external > > modification, or there is otherwise no path to recover outside of > > xfs_repair (i.e., the current verify failure behavior) then EFSCORRUPTED > > makes sense to me. > > *nod* > > > If it is possible (even if it only occurs as an > > artifact of log recovery) and everything works except for reading the > > actual link returns an error (i.e., it can be renamed, unlinked), then > > I'm not sure EFSCORRUPTED is the right error. > > We shouldn't ever get that far on a zero-length symlink/directory > because the inode is clearly in an invalid state.... <nod> Ok, I'll leave the verifier as it is for now. --D > > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html