Re: bug: xfs_repair becomes very slow when file system has a large sparse file

Dave Chinner <david@xxxxxxxxxxxxx> · Sat, 20 Aug 2011 10:26:57 +1000

On Fri, Aug 19, 2011 at 12:37:05PM -0400, Joe Landman wrote:
> (If you prefer we file this on a bug reporting system, please let me
> know where and I'll do this).
> 
> Scenario:  xfs_repair being run against an about 17TB volume,
> containing 1 large sparse file.  Logical size of 7 PB, actual size,
> a few hundred GB.
> 
> Metadata:  Kernel = 2.6.32.41, 2.6.39.4, and others. Xfstools 3.1.5.
> Hardware RAID ~17TB LUN.  Base OS: Centos 5.6 + updates + updated
> xfs tools + our kernels.  Using external journal on a different
> device
> 
> What we observe:
> 
> Running xfs_repair
> 
> 	xfs_repair -l /dev/md2 -vv /dev/sdd2

can you post the actual output of xfs_repair?

> 
> the system gets to stage 3 and the first ag.  Then it appears to
> stop. After an hour or so, we strace it, and we see
>
> 	pread(...) = 4096

and the same for the strace, along with syscall completion time?
(i.e. strace -ttt -T .....) That will tell us if the time is spend
doing IO or in the repair binary.

What is the CPU usage when this happens? How much memory do you
have? Is the machine swapping while it is slowing down? A couple of
minutes output of 'vmstat 5' when it is in this state would be handy.

> occurring about 2-3 per second.  An hour later, its down to 1 per
> second.   An hour after that, its once every 2 seconds.
> 
> Also, somewhere on this disk, someone has created an unfortunately
> large file
> 
> [root@jr4-2 ~]# ls -alF /data/brick-sdd2/dht/scratch/xyzpdq
> total 4652823496
> d---------   2 1232 1000               86 Jun 27 20:31 ./
> drwx------ 104 1232 1000            65536 Aug 17 23:53 ../
> -rw-------   1 1232 1000               21 Jun 27 09:57 Default.Route
> -rw-------   1 1232 1000              250 Jun 27 09:57 Gau-00000.inp
> -rw-------   1 1232 1000                0 Jun 27 09:57 Gau-00000.d2e
> -rw-------   1 1232 1000 7800416534233088 Jun 27 20:18 Gau-00000.rwf
> 
> [root@jr4-2 ~]# ls -ahlF /data/brick-sdd2/dht/scratch/xyzpdq
> total 4.4T
> d---------   2 1232 1000   86 Jun 27 20:31 ./
> drwx------ 104 1232 1000  64K Aug 17 23:53 ../
> -rw-------   1 1232 1000   21 Jun 27 09:57 Default.Route
> -rw-------   1 1232 1000  250 Jun 27 09:57 Gau-00000.inp
> -rw-------   1 1232 1000    0 Jun 27 09:57 Gau-00000.d2e
> -rw-------   1 1232 1000 7.0P Jun 27 20:18 Gau-00000.rwf
> 
> This isn't a 7PB file system, its a 100TB file system across 3
> machines, roughly 17TB per brick or OSS.  The Gau-00000.rwf is
> obviously a sparse file, as could be seen with an ls -alsF

What does du tell you about it?  xfs_io -f -c "stat" <large file>?
xfs_bmap -vp <large file>?

> Upon removing that file, the xfs_repair completes within ~10
> minutes. Leaving that file on there, the xfs_repair does not
> terminate, it just gets asymptotically slower.

That could simply be the memory footprint causing more swapping per
operation to occur. Or it could be that something is simply getting
too large for the index type being used. If the machine is not
swapping, can you point 'perf top -p <pid of xfs_repair>' at it so
we might see where that CPU time is being spent?  (you might need to
use a non-stripped version of the binary to get any useful
information)

> Please let me know if you need more information, or if you would
> like me to file this somewhere else for official reportage.

This is the right place to let us know about problems.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs