> On Mon, Sep 18, 2023 at 11:44:53AM +0800, cheng.lin130@xxxxxxxxxx wrote: > > > On Fri, Sep 15, 2023 at 05:50:56PM +0800, cheng.lin130@xxxxxxxxxx wrote: > > > > > On Wed, Sep 13, 2023 at 05:44:45PM +0800, cheng.lin130@xxxxxxxxxx wrote: > > > > > > From: Cheng Lin <cheng.lin130@xxxxxxxxxx> > > > > > > > > > > > > When abnormal drop_nlink are detected on the inode, > > > > > > shutdown filesystem, to avoid corruption propagation. > > > > > > > > > > > > Signed-off-by: Cheng Lin <cheng.lin130@xxxxxxxxxx> > > > > > > --- > > > > > > fs/xfs/xfs_inode.c | 9 +++++++++ > > > > > > 1 file changed, 9 insertions(+) > > > > > > > > > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > > > > > > index 9e62cc500..40cc106ae 100644 > > > > > > --- a/fs/xfs/xfs_inode.c > > > > > > +++ b/fs/xfs/xfs_inode.c > > > > > > @@ -919,6 +919,15 @@ xfs_droplink( > > > > > > xfs_trans_t *tp, > > > > > > xfs_inode_t *ip) > > > > > > { > > > > > > + > > > > > > + if (VFS_I(ip)->i_nlink == 0) { > > > > > > + xfs_alert(ip->i_mount, > > > > > > + "%s: Deleting inode %llu with no links.", > > > > > > + __func__, ip->i_ino); > > > > > > + tp->t_flags |= XFS_TRANS_DIRTY; > > > > > Marking the transaction dirty is not necessary. > > > > > Otherwise this seems fine. > > > > Another strategy: > > > > Set nlink to an invalid value(like XFS_NLINK_PINNED), and > > > > Complete this transaction before shutdown fs. To make sure > > > > nlink not be zero. If the nlink of a directory are zero, it may > > > > be cleaned up. > > > > Is that appropriate? > > > No, all I'm asking you to do is drop dirtying of the transaction > > > from this patch because it is a) unnecessary and b) a layering > > > violation. > > > It is unnecessary because the transaction will almost always be > > > dirty before we get to xfs_droplink(). In the cases where it isn't > > > dirty (e.g. xfs_remove() on a directory) we explicitly check that > > > nlink == 2 before proceeding to call xfs_droplink(). Hence we can't > > > actually get to xfs_droplink() with a clean transaction, and so > > If the corrupted inode is a parent directory, when remove its > > subdirectory, the parent's nlink will be decreased to 0. But the > > transaction of subdirectory removing is not dirty (There are not > > check about the parent directory). In this situation, the transaction > > will be failed and the filesystem will be alive. > Yes, and that's perfectly fine. The transaction cancelling code has > worked this way for the past 20 years or so... > Indeed, you said your customer wants the system to stay alive if possible, > right? Well, so do we. > If the parent directory has a bogus nlink count, and that prevents > us from removing items from the directory, then as long as we > haven't dirtied anything and we can return a -EFSCORRUPTED error to > userspace to say the unlink failed and we don't have to shut the > filesystem down. All we now have is a directory that has objects in > it that can't be removed.... > For a higher level perspective, we only need to shut the filesystem > down if we cannot safely back out of the modification operation that > was requested. Whilst the transaction is clean, we can safely return > errors to userspace and continue operation because everything in > memory and on disk is still consistent, even if we have found a > corruption in non-crtical the metadata. Just returning an error to > userspace can't make the problem any worse. > This also is how we treat corruption that is found during read > operations - we return -EFSCORRUPTED to userspace because something in > the directory or inode we were trying to read from was corrupted. We > do not need to shut down the filesystem because there is > no risk of making things worse or the in-memory filesystem state > getting out of sync with the on-disk state. > It is only when we are trying to modify something that corruption > becomes a problem with fatal consequences. Once we've made a > modification, the in-memory state is different to the on-disk state > and whilst we are in that state any corruption we discover becomes > fatal. That is because there is no way to reconcile the changes > we've already made in memory with what is on-disk - we don't know > that the in-memory changes are good because we tripped over > corruption, and so we must not propagate bad in-memory state and > metadata to disk over the top of what may be still be uncorrupted > metadata on disk. > This "in memory matches on disk" state is effectively what the dirty > flag in the transaction tracks, and it's done as part of the normal > running of a transaction as items are tagged for logging. Marking a > transaction dirty that has nothign tagged for logging is actually an > incorrect state; we may handle it correctly, but it should never > actually occur and we should definitely not be open coding dirtying > of transactions to create this state. > IOWs, the transaction modification error handling paths already do > the right thing according to the state carried by the transaction at > the time the error was encountered. Thanks for your detailed explainations, I get it. > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx