On Mon, Dec 20, 2021 at 10:48:31AM -0800, Chanchal Mathew wrote: > I am starting an EC2 instance from an image file with XFS file system. The file system is cleanly unmounted, and the log print output I shared before is run on it before creating a snapshot and starting an instance from it. > The time is measured from dmesg & journalctl logs. Here is a sample output. Kernel version 5.10.x > > - > [ 1.984392] XFS (nvme0n1p1): Mounting V5 Filesystem > [ 3.538829] XFS (nvme0n1p1): Ending clean mount > - > > I then edited XFS code to print more log messages for this code path. And found it to spend the nearly 1s time on xlog_find_tail. Here is the debug dmesg output from another image with high log number. If I reset the log to 0, then the mount completes in 200-300ms avg. For a long tail number of 17983, for example, it takes 1s approx. > > - > [ 2.387962] XFS (nvme0n1p1): starting xlog_find_tail > [ 2.392108] XFS (nvme0n1p1): Find previous log record > [ 2.402898] XFS (nvme0n1p1): Assert head_blk < INT_MAX: 17983 > [ 2.407355] XFS (nvme0n1p1): Allocate buffer > [ 2.411108] XFS (nvme0n1p1): Search backwards through the log > [ 2.416299] XFS (nvme0n1p1): tail_blk: 17979 > [ 2.420045] XFS (nvme0n1p1): Set the log state based on the current head record > [ 2.426834] XFS (nvme0n1p1): Look for an unmount record at the head of the log > [ 2.433763] XFS (nvme0n1p1): Verify the log head of the log is not clean > [ 2.438599] XFS (nvme0n1p1): Note that the unmount was clean > [ 2.442950] XFS (nvme0n1p1): Make sure that there are no blocks in front of the head > [ 3.439000] XFS (nvme0n1p1): finished xlog_find_tail So there's a delay while recovery writes up to 2MB of zeroed blocks beyond the head so that future recoveries won't get confused by stale, unrecovered partial log writes that this recovery ignored. > Here is the output of xfs_info on the mounted image: > > - > $ xfs_info /dev/mapper/loop2p1 > meta-data=/dev/mapper/loop2p1 isize=512 agcount=4, agsize=524159 blks > = sectsz=512 attr=2, projid32bit=1 > = crc=1 finobt=1 spinodes=0 V5 format - "zeroing" the log with xfs_repair is not actually zeroing the log sequence numbers - it's re-initialising the log, not "resetting it to zero". > data = bsize=4096 blocks=2096635, imaxpct=25 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 ftype=1 > log =internal bsize=4096 blocks=2560, version=2 And so you have a 10MB log, which means the log block zeroing is wrapping around the head and having to do two IOs instead of one to clear the 2MB region. That is, head_blk = 17983, log->l_logbbsize = 20480, zeroing needs to run to 17983 + 4096 = 22079, which means it wraps the log and has to zero 17984 -> 20389, then 0 -> 1599. WHen the headblk is smaller, this is just a single 2MB sized IO. When the headblk is within 2MB of the end of the log, it is two sub-2MB write IOs issued sequentially. If this is taking a long time, then I'd be looking at why the different IO patterns in xlog_clear_stale_blocks() result in such a long delay. Smells like a storage problem, not an XFS issue. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx