Re: very long log recovery at mount

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 23 Oct 2015 18:05:32 +1100

On Wed, Oct 21, 2015 at 11:27:52AM +0200, Arkadiusz Miśkiewicz wrote:
> 
> Hi.
> 
> I got such situation, fresh boot, 4.1.10 kernel, init scripts start mounting 
> filesystems. One fs wasn't very lucky:
> 
> [   15.979538] XFS (md3): Mounting V4 Filesystem
> [   16.256316] XFS (md3): Ending clean mount
> [   28.343346] XFS (md4): Mounting V4 Filesystem
> [   28.629918] XFS (md4): Ending clean mount
> [   28.662125] XFS (md5): Mounting V4 Filesystem
> [   28.980142] XFS (md5): Ending clean mount
> [   29.049421] XFS (md6): Mounting V4 Filesystem
> [   29.447725] XFS (md6): Starting recovery (logdev: internal)
> [ 4517.327332] XFS (md6): Ending recovery (logdev: internal)
> 
> It took over 1h to mount md6 filesystem.
> 
> Questions:
> - is it possible to log how much data is needed to be recovered
> from log?

Yes.

> Some data that would give a hint on how big this is (and thus
> rough estimate on how long it will take). Not sure if that's known
> at time when this message is being printed.

It's not known, then, and can't be known until recovery has sparsed
the log and read all the objects from disk it needs to recover.

> - now such long mount time is almost insane, so I wonder why could
> be the reason. Is the process multithreaded, single threaded? cpus
> were idle

What kernel? We now have readahead which minimises the IO latency of
pulling objects into the kernel for recovery, but if you are
recovering a couple of million individual inode changes (e.g. from a
'chproj -R /path/with/millions/of/files') then it take a long tiem
to read in all the inodes and write them all back out. A single
inode in the log lik ethis only consumes about 200 bytes of log
space, so there can easily be 5000 inodes to recover per megabyte
of log space you have. And if you have a 2GB log, then that could
contain 10 million inode core changes that need to be recovered....

And if you have a slow SATA RAID, the small random writes for inode
writeback might only retire a hundred inodes a second and log
recovery will block until all those inodes are completely written
back (i.e. same problem that can lead to unmount taking hours when
there are hundreds of thousands of dirty inode in memory).

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs