Re: XFS journal log resetting

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 21 Dec 2021 07:54:58 +1100

On Mon, Dec 20, 2021 at 10:48:31AM -0800, Chanchal Mathew wrote:
> I am starting an EC2 instance from an image file with XFS file system. The file system is cleanly unmounted, and the log print output I shared before is run on it before creating a snapshot and starting an instance from it.
> The time is measured from dmesg & journalctl logs. Here is a sample output. Kernel version 5.10.x
>  
> -
> [  1.984392] XFS (nvme0n1p1): Mounting V5 Filesystem
> [  3.538829] XFS (nvme0n1p1): Ending clean mount
> -
> 
> I then edited XFS code to print more log messages for this code path. And found it to spend the nearly 1s time on xlog_find_tail. Here is the debug dmesg output from another image with high log number. If I reset the log to 0, then the mount completes in 200-300ms avg. For a long tail number of 17983, for example, it takes 1s approx.
>  
> -
> [  2.387962] XFS (nvme0n1p1): starting xlog_find_tail
> [  2.392108] XFS (nvme0n1p1): Find previous log record
> [  2.402898] XFS (nvme0n1p1): Assert head_blk < INT_MAX: 17983
> [  2.407355] XFS (nvme0n1p1): Allocate buffer
> [  2.411108] XFS (nvme0n1p1): Search backwards through the log
> [  2.416299] XFS (nvme0n1p1): tail_blk: 17979
> [  2.420045] XFS (nvme0n1p1): Set the log state based on the current head record
> [  2.426834] XFS (nvme0n1p1): Look for an unmount record at the head of the log
> [  2.433763] XFS (nvme0n1p1): Verify the log head of the log is not clean
> [  2.438599] XFS (nvme0n1p1): Note that the unmount was clean
> [  2.442950] XFS (nvme0n1p1): Make sure that there are no blocks in front of the head
> [  3.439000] XFS (nvme0n1p1): finished xlog_find_tail

So there's a delay while recovery writes up to 2MB of zeroed blocks
beyond the head so that future recoveries won't get confused by
stale, unrecovered partial log writes that this recovery ignored.

> Here is the output of xfs_info on the mounted image:
>  
> -
> $ xfs_info /dev/mapper/loop2p1
> meta-data=/dev/mapper/loop2p1  isize=512  agcount=4, agsize=524159 blks
>      =            sectsz=512  attr=2, projid32bit=1
>      =            crc=1    finobt=1 spinodes=0

V5 format - "zeroing" the log with xfs_repair is not actually zeroing
the log sequence numbers - it's re-initialising the log, not
"resetting it to zero".

> data   =            bsize=4096  blocks=2096635, imaxpct=25
>      =            sunit=0   swidth=0 blks
> naming  =version 2       bsize=4096  ascii-ci=0 ftype=1
> log   =internal        bsize=4096  blocks=2560, version=2

And so you have a 10MB log, which means the log block zeroing is
wrapping around the head and having to do two IOs instead of one to
clear the 2MB region.

That is, head_blk = 17983, log->l_logbbsize = 20480, zeroing needs
to run to 17983 + 4096 = 22079, which means it wraps the log and
has to zero 17984 -> 20389, then 0 -> 1599.

WHen the headblk is smaller, this is just a single 2MB sized IO.
When the headblk is within 2MB of the end of the log, it is two
sub-2MB write IOs issued sequentially. If this is taking a long
time, then I'd be looking at why the different IO patterns in
xlog_clear_stale_blocks() result in such a long delay. Smells like
a storage problem, not an XFS issue.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx