Re: [PATCH] xfs: introduce "metasync" api to sync metadata to fsblock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/14/19 4:43 AM, Jan Kara wrote:
On Mon 14-10-19 16:33:15, Pingfan Liu wrote:
On Sun, Oct 13, 2019 at 09:34:17AM -0700, Darrick J. Wong wrote:
On Sun, Oct 13, 2019 at 10:37:00PM +0800, Pingfan Liu wrote:
When using fadump (fireware assist dump) mode on powerpc, a mismatch
between grub xfs driver and kernel xfs driver has been obsevered.  Note:
fadump boots up in the following sequence: fireware -> grub reads kernel
and initramfs -> kernel boots.

The process to reproduce this mismatch:
   - On powerpc, boot kernel with fadump=on and edit /etc/kdump.conf.
   - Replacing "path /var/crash" with "path /var/crashnew", then, "kdumpctl
     restart" to rebuild the initramfs. Detail about the rebuilding looks
     like: mkdumprd /boot/initramfs-`uname -r`.img.tmp;
           mv /boot/initramfs-`uname -r`.img.tmp /boot/initramfs-`uname -r`.img
           sync
   - "echo c >/proc/sysrq-trigger".

The result:
The dump image will not be saved under /var/crashnew/* as expected, but
still saved under /var/crash.

The root cause:
As Eric pointed out that on xfs, 'sync' ensures the consistency by writing
back metadata to xlog, but not necessary to fsblock. This raises issue if
grub can not replay the xlog before accessing the xfs files. Since the
above dir entry of initramfs should be saved as inline data with xfs_inode,
so xfs_fs_sync_fs() does not guarantee it written to fsblock.

umount can be used to write metadata fsblock, but the filesystem can not be
umounted if still in use.

There are two ways to fix this mismatch, either grub or xfs. It may be
easier to do this in xfs side by introducing an interface to flush metadata
to fsblock explicitly.

With this patch, metadata can be written to fsblock by:
   # update AIL
   sync
   # new introduced interface to flush metadata to fsblock
   mount -o remount,metasync mountpoint

I think this ought to be an ioctl or some sort of generic call since the
jbd2 filesystems (ext3, ext4, ocfs2) suffer from the same "$BOOTLOADER
is too dumb to recover logs but still wants to write to the fs"
checkpointing problem.
Yes, a syscall sounds more reasonable.

(Or maybe we should just put all that stuff in a vfat filesystem, I
don't know...)
I think it is unavoidable to involve in each fs' implementation. What
about introducing an interface sync_to_fsblock(struct super_block *sb) in
the struct super_operations, then let each fs manage its own case?

Well, we already have a way to achieve what you need: fsfreeze.
Traditionally, that is guaranteed to put fs into a "clean" state very much
equivalent to the fs being unmounted and that seems to be what the
bootloader wants so that it can access the filesystem without worrying
about some recovery details. So do you see any problem with replacing
'sync' in your example above with 'fsfreeze /boot && fsfreeze -u /boot'?

								Honza

The problem with fsfreeze is that if the device you want to quiesce is, say,
the root fs, freeze isn't really a good option.

But the other thing I want to highlight about this approach is that it does not
solve the root problem: something is trying to read the block device without
first replaying the log.

A call such as the proposal here is only going to leave consistent metadata at
the time the call returns; at any time after that, all guarantees are off again,
so the problem hasn't been solved.

-Eric




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux