On Mon, Mar 06, 2017 at 01:18:20PM -0500, Brian Foster wrote: > On Mon, Mar 06, 2017 at 09:22:05AM -0800, Darrick J. Wong wrote: > > On Mon, Mar 06, 2017 at 08:51:44AM -0500, Brian Foster wrote: > > > On Fri, Mar 03, 2017 at 04:05:29PM -0800, Darrick J. Wong wrote: > > > > In commit ef388e2054 ("don't allow di_size with high bit set") and > > > > commit 3c6f46eacd87 ("sanity check directory inode di_size") we > > > > erroneously fail the inode verification checks for symlinks or > > > > directories with di_size == 0. > > > > > > > > This is proven incorrect by generic/388 because the unlink process uses > > > > multiple unconnected transactions to truncate the remote symlink / > > > > directory and to unlink and free the inode. If the fs goes down after > > > > the truncate commit but before the unlink happens, we are left with a > > > > zero-length inode. Worse yet, if the unlink /does/ commit, log recovery > > > > will now break after the first transaction because we've zeroed di_size. > > > > Ugh, I misspoke slightly. Truncate and *ifree* are separate > > transactions that can lose each other in the mist after a crash. > > > > So that separates this discussion into two classes of problem -- the > > first is the case of "truncate, crash, forget to ifree", which at least > > isn't a user-visible problem because the inode has been unlinked from > > the directory tree at that point. Log recovery breaks if iget -> > > dinode_verify stumbles over the zero-length inode. > > > > Ok, that change seems reasonable. Not to me, though - it's indicative of an atomicity problem, and not something we should work around by considering the intermediate "corrupt" state on disk valid. Remember that one of the main reasons for checks like these in the verifiers is that they tell us where our transactional change atomicity is broken. Gutting the verifier because it's indicated we have a change atomicity problem is not the solution. i.e. I think we should be looking at changing xfs_inactive() to use the deferred ops state machine for the multiple {truncate-data, truncata-attr, free inode} transactions. Then we don't need to care about temporary incomplete state on disk - if we get the first transaction completed, log recovery will have an intent one disk for the next transaction, and we can complete the inode freeing process rather than leaving a dangling chad. Making an exception in a verifier for a special log recovery state is fine - we have to handle all sorts of interesting intermediate states in log recovery - but as a general rule if it's an invalid state the verifier should always be catching it.... [snip] > Essentially what I'm saying is that if it is not possible to have a > zero-sized symlink in a directory except for a bug or external > modification, or there is otherwise no path to recover outside of > xfs_repair (i.e., the current verify failure behavior) then EFSCORRUPTED > makes sense to me. *nod* > If it is possible (even if it only occurs as an > artifact of log recovery) and everything works except for reading the > actual link returns an error (i.e., it can be renamed, unlinked), then > I'm not sure EFSCORRUPTED is the right error. We shouldn't ever get that far on a zero-length symlink/directory because the inode is clearly in an invalid state.... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html