On 9/22/16 7:11 PM, Dave Chinner wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > Recently we've had a number of reports where log recovery on a v5 > filesystem has reported corruptions that looked to be caused by > recovery being re-run over the top of an already-recovered > metadata. This has uncovered a bug in recovery (fixed elsewhere) > but the vector that caused this was largely unknown. > > A kdump test started tripping over this problem - the system > would be crashed, the kdump kernel and environment would boot and > dump the kernel core image, and then the system would reboot. After > reboot, the root filesystem was triggering log recovery and > corruptions were being detected. The metadumps indicated the above > log recovery issue. > > What is happening is that the kdump kernel and environment is > mounting the root device read-only to find the binaries needed to do > it's work. The result of this is that it is running log recovery. > However, because there were unlinked files and EFIs to be processed > by recovery, the completion of phase 1 of log recovery could not > mark the log clean. And because it's a read-only mount, the unmount > process does not write records to the log to mark it clean, either. > Hence on the next mount of the filesystem, log recovery was run > again across all the metadata that had already been recovered and > this is what triggered corruption warnings. > > To avoid this problem, we need to ensure that a read-only mount > always updates the log when it completes the second phase of > recovery. We already handle this sort of issue with rw->ro remount > transitions, so the solution is as simple as quiescing the > filesystem at the appropriate time during the mount process. This > results in the log being marked clean so the mount behaviour > recorded in the logs on repeated RO mounts will change (i.e. log > recovery will no longer be run on every mount until a RW mount is > done). This is a user visible change in behaviour, but it is > harmless. Excellent idea :) Reviewed-by: Eric Sandeen <sandeen@xxxxxxxxxx> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > --- > fs/xfs/xfs_mount.c | 14 ++++++++++++++ > fs/xfs/xfs_super.c | 2 +- > fs/xfs/xfs_super.h | 1 + > 3 files changed, 16 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c > index faeead6..56e85a6 100644 > --- a/fs/xfs/xfs_mount.c > +++ b/fs/xfs/xfs_mount.c > @@ -934,6 +934,20 @@ xfs_mountfs( > } > > /* > + * Now the log is fully replayed, we can transition to full read-only > + * mode for read-only mounts. This will sync all the metadata and clean > + * the log so that the recovery we just performed does not have to be > + * replayed again on the next mount. > + * > + * We use the same quiesce mechanism as the rw->ro remount, as they are > + * semantically identical operations. > + */ > + if ((mp->m_flags & (XFS_MOUNT_RDONLY|XFS_MOUNT_NORECOVERY)) == > + XFS_MOUNT_RDONLY) { > + xfs_quiesce_attr(mp); > + } > + > + /* > * Complete the quota initialisation, post-log-replay component. > */ > if (quotamount) { > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > index 3409753..2d092f9 100644 > --- a/fs/xfs/xfs_super.c > +++ b/fs/xfs/xfs_super.c > @@ -1137,7 +1137,7 @@ xfs_restore_resvblks(struct xfs_mount *mp) > * Note: xfs_log_quiesce() stops background log work - the callers must ensure > * it is started again when appropriate. > */ > -static void > +void > xfs_quiesce_attr( > struct xfs_mount *mp) > { > diff --git a/fs/xfs/xfs_super.h b/fs/xfs/xfs_super.h > index 529bce9..b6418ab 100644 > --- a/fs/xfs/xfs_super.h > +++ b/fs/xfs/xfs_super.h > @@ -61,6 +61,7 @@ struct xfs_mount; > struct xfs_buftarg; > struct block_device; > > +extern void xfs_quiesce_attr(struct xfs_mount *mp); > extern void xfs_flush_inodes(struct xfs_mount *mp); > extern void xfs_blkdev_issue_flush(struct xfs_buftarg *); > extern xfs_agnumber_t xfs_set_inode_alloc(struct xfs_mount *, > _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs