On Fri 08-01-21 23:58:52, Eric Biggers wrote: > From: Eric Biggers <ebiggers@xxxxxxxxxx> > > When lazytime is enabled and an inode is being written due to its > in-memory updated timestamps having expired, either due to a sync() or > syncfs() system call or due to dirtytime_expire_interval having elapsed, > the VFS needs to inform the filesystem so that the filesystem can copy > the inode's timestamps out to the on-disk data structures. > > This is done by __writeback_single_inode() calling > mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC). > > However, this occurs after __writeback_single_inode() has already > cleared the dirty flags from ->i_state. This causes two bugs: > > - mark_inode_dirty_sync() redirties the inode, causing it to remain > dirty. This wastefully causes the inode to be written twice. But > more importantly, it breaks cases where sync_filesystem() is expected > to clean dirty inodes. This includes the FS_IOC_REMOVE_ENCRYPTION_KEY > ioctl (as reported at > https://lore.kernel.org/r/20200306004555.GB225345@xxxxxxxxx), as well > as possibly filesystem freezing (freeze_super()). > > - Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is > called from __writeback_single_inode() for lazytime expiration, > xfs_fs_dirty_inode() ignores the notification. (XFS only cares about > lazytime expirations, and it assumes that I_DIRTY_TIME will contain > i_state during those.) Therefore, lazy timestamps aren't persisted by > sync(), syncfs(), or dirtytime_expire_interval on XFS. > > Fix this by moving the call to mark_inode_dirty_sync() to earlier in > __writeback_single_inode(), before the dirty flags are cleared from > i_state. This makes filesystems be properly notified of the timestamp > expiration, and it avoids incorrectly redirtying the inode. > > This fixes xfstest generic/580 (which tests > FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime > enabled. It also fixes the new lazytime xfstest I've proposed, which > reproduces the above-mentioned XFS bug > (https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@xxxxxxxxxx). > > Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly. But > due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the > right thing to do because mark_inode_dirty_sync() now knows not to move > the inode to a writeback list if it is currently queued for sync. > > Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option") > Cc: stable@xxxxxxxxxxxxxxx > Depends-on: 5afced3bf281 ("writeback: Avoid skipping inode writeback") > Suggested-by: Jan Kara <jack@xxxxxxx> > Signed-off-by: Eric Biggers <ebiggers@xxxxxxxxxx> Thanks for writing this fix! It looks good to me. You can add: Reviewed-by: Jan Kara <jack@xxxxxxx> Honza > --- > fs/fs-writeback.c | 24 +++++++++++++----------- > 1 file changed, 13 insertions(+), 11 deletions(-) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index acfb55834af23..c41cb887eb7d3 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -1474,21 +1474,25 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc) > } > > /* > - * Some filesystems may redirty the inode during the writeback > - * due to delalloc, clear dirty metadata flags right before > - * write_inode() > + * If the inode has dirty timestamps and we need to write them, call > + * mark_inode_dirty_sync() to notify the filesystem about it and to > + * change I_DIRTY_TIME into I_DIRTY_SYNC. > */ > - spin_lock(&inode->i_lock); > - > - dirty = inode->i_state & I_DIRTY; > if ((inode->i_state & I_DIRTY_TIME) && > - ((dirty & I_DIRTY_INODE) || > - wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync || > + (wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync || > time_after(jiffies, inode->dirtied_time_when + > dirtytime_expire_interval * HZ))) { > - dirty |= I_DIRTY_TIME; > trace_writeback_lazytime(inode); > + mark_inode_dirty_sync(inode); > } > + > + /* > + * Some filesystems may redirty the inode during the writeback > + * due to delalloc, clear dirty metadata flags right before > + * write_inode() > + */ > + spin_lock(&inode->i_lock); > + dirty = inode->i_state & I_DIRTY; > inode->i_state &= ~dirty; > > /* > @@ -1509,8 +1513,6 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc) > > spin_unlock(&inode->i_lock); > > - if (dirty & I_DIRTY_TIME) > - mark_inode_dirty_sync(inode); > /* Don't write the inode if only I_DIRTY_PAGES was set */ > if (dirty & ~I_DIRTY_PAGES) { > int err = write_inode(inode, wbc); > -- > 2.30.0 > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR