Re: xfsrestore performance

xfs.pkoch@xxxxxxxx · Tue, 31 May 2016 11:39:39 +0200

Dear Dave:

Thanks very much for your explanations

2016-05-30 1:20 GMT+02:00 Dave Chinner - david@xxxxxxxxxxxxx <xfs.pkoch.2540fe3cfd.david#fromorbit.com@xxxxxxxxxx>:
.... 

Oh, dear. There's a massive red flag. I'll come back to it...

> 5: xfsdump the temporary xfs fs to /dev/null. took 20 hours

Nothing to slow down xfsdump reading from disk. Benchmarks lie.

dump is fast - restore is the slow point because it has to recreate

everything. That's what limits the speed of dump - the pipe has a

bound limit on data in flight, so dump is throttled to restore

speed when you run this.

And, as I said I'll come back to, restore is slow because:

The filesystem is not exactly as you described.  Did you notice that

xfs_restore realises that it has to restore 20 million directories

and *274 million* directory entries? i.e. for those 7 million inodes

containing data, there is roughly 40 hard links pointing to each

inode. There are also 3 directory inodes for every regular file.

This is not a "data mostly" filesystem - it has vastly more metadata

than it has data, even though the data takes up more space.

Our backup-server has 46 versions of our home-directories and 158
versions of our mailserver, so if a file has not been changed for more
than a year it will exist once on the backup server together with
45 / 157 hard links.

I'm astonished myself. Firstly about the numbers and also about
the fact that our backup-strategy does work quite well.

Also rsync does a very good job. It was able to copy all these hard links
in 6 days from a 16TB ext3 filesystem on a RAID10-volume to a
15TB xfs filesystem on a RAID5-volume.

And right now 4 rsync processes are copying the 15TB xfs filesystem
back to a 20TB xfs-filesystem. And it seems as if this will finish
today (after 3 days only). Very nice.

Keep in mind that it took dump the best part of 7 hours just to read

all the inodes and the directory structure to build the dump

inventory. This matches with the final ext3 rsync pass of 10 hours

which should have copied very little data.  Creating 270 million

hard links in 20 million directories from scratch takes a long time,

and xfs_restore will be no faster at that than rsync....

That was my misunderstanding. I was believing/hoping that a tool
that was built for a specific filesystem would outperform a generic
tool like rsync. I thought xfsdump would write all used filesystem
blocks into a data stream and xfsrestore would just read the
blocks from stdin and write them back to the destination filesystem.
Much like a dd-process that knows about the device-content and
can skip unused blocks.

> Seems like 2 days was a little optimistic

Just a little. :/

It would have taken approx 1000 hours

Personally, I would have copied the data using rsync to the

temporary XFS filesystem of the same size and shape of the final

destination (via mkfs parameters to ensure stripe unit/width match

final destination) and then used xfs_copy to do a block level copy

of the temporary filesystem back to the final destination. xfs_copy

will run *much* faster than xfsdump/restore....

Next time I will do it like you suggest with one minor change. Instead
of xfs_copy I would use dd, which makes sense if the filesystem is
almost filled. Or do you believe that xfs_copy is faster then dd?
Or will the xfs_growfs create any problems?

I used dd on saturday to copy the 15TB xfs filesystem back
into the 20TB raid10 volume and enlarged the filesystem with
xfs_growfs. The result was a xfs-filesystem with layout-parameters
matching the temporary raid5 volume built from 16 1TB disks
with a 256K chunksize. But the new raid10-volume consists of
20 2TB disks using a chunksize of 512K. And growing the filesystem
raised the allocation group count from 32 to 45.

I reformatted the 20TB volume with a fresh xfs-filesystem and I
let mkfs.xfs decide about the layout.

Does that give me an optimal layout? I will enlarge the filesystem
in the future. This will increase my allocation group count. Is that
a problem that I should better have avoided in advance by reducing
the agcount?

Kind regards and thanks very much for the useful infose

Peter Koch

-- 
Peter Koch

Passauer Strasse 32, 47249 Duisburg

Tel.: 0172 2470263

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs