Re: potentially lost largeish raid5 array..

David Brown <david.brown@xxxxxxxxxxxx> · Sun, 25 Sep 2011 17:18:21 +0200

On 25/09/11 16:39, Stan Hoeppner wrote:
On 9/25/2011 8:03 AM, David Brown wrote:
On 24/09/2011 18:38, Stan Hoeppner wrote:
On 9/24/2011 10:16 AM, David Brown wrote:
On 24/09/2011 14:17, Stan Hoeppner wrote:
On 9/23/2011 7:11 PM, Thomas Fjellstrom wrote:
On September 23, 2011, Stan Hoeppner wrote:

When properly configured XFS will achieve near spindle throughput.
Recent versions of mkfs.xfs read the mdraid configuration and
configure
the filesystem automatically for sw, swidth, number of allocation
groups, etc. Thus you should get max performance out of the gate.

What happens when you add a drive and reshape? Is it enough just to
tweak the
mount options?

When you change the number of effective spindles with a reshape, and
thus the stripe width and stripe size, you definitely should add the
appropriate XFS mount options and values to reflect this. Performance
will be less than optimal if you don't.

If you use a linear concat under XFS you never have to worry about the
above situation. It has many other advantages over a striped array and
better performance for many workloads, especially multi user general
file serving and maildir storage--workloads with lots of concurrent
IO.
If you 'need' maximum single stream performance for large files, a
striped array is obviously better. Most applications however don't
need
large single stream performance.

If you use a linear concatenation of drives for XFS, is it not correct
that you want one allocation group per drive (or per raid set, if you
are concatenating a bunch of raid sets)?

Yes. Normally with a linear concat you would make X number of RAID1
mirrors via mdraid or hardware RAID, then concat them with mdadm
--linear or LVM. Then mkfs.xfs -d ag=X ...

Currently XFS has a 1TB limit for allocation groups. If you use 2TB
drives you'll get 2 AGs per effective spindle instead of one. With some
'borderline' workloads this may hinder performance. It depends on how
many top level directories you have in the filesystem and your
concurrency to them.

If you then add another drive
or raid set, can you grow XFS with another allocation group?

XFS creates more allocation groups automatically as part of the grow
operation. If you have a linear concat setup you'll obviously wan to
control this manually to maintain the same number of AGs per effective
spindle.

Always remember that the key to linear concat performance with XFS is
directory level parallelism. If you have lots of top level directories
in your filesystem and high concurrent access (home dirs, maildir, etc)
it will typically work better than a striped array. If you have few
directories and low concurrency, are streaming large files, etc, stick
with a striped array.

I understand the point about linear concat and allocation groups being a
good solution when you have multiple parallel accesses to different
files, rather than streamed access to a few large files.

Not just different files, but files in different top level directories.

But you seem to be suggesting here that accesses to different files
within the same top-level directory will be put in the same allocation
group - is that correct?

When you create a top level directory on an XFS filesystem it is
physically created in one of the on disk allocation groups. When you
create another directory it is physically created in the next allocation
group, and so on, until it wraps back to the first AG. This is why XFS
can derive parallelism from a linear concat and no other filesystem can.
Performance is rarely perfectly symmetrical, as the workload dictates
the file, and thus physical IO, access patterns.

But, with maildir and similar workloads, the odds are very high that
you'll achieve good directory level parallelism because each mailbox is
in a different directory. I've previously discussed the many other
reasons why XFS on a linear concat beats the stuffing out of anything on
a striped array for a maildir workload so I won't repeat all that here.

That strikes me as very limiting - it is far
from uncommon for most accesses to be under one or two top-level
directories.

By design or ignorance? What application workload? What are the IOPS and
bandwidth needs of this workload you describe? Again, read the paragraph
below, which you apparently skipped the first time.

Perhaps I am not expressing myself very clearly.  I don't mean to sound 
patronising by spelling it out like this - I just want to be sure I'm 
getting an answer to the question in my mind (assuming, of course, you 
have time and inclination to help me - you've certainly been very 
helpful and informative so far!).

Suppose you have an xfs filesystem with 10 allocation groups, mounted on 
/mnt.  You make a directory /mnt/a.  That gets created in allocation 
group 1.  You make a second directory /mnt/b.  That gets created in 
allocation group 2.  Any files you put in /mnt/a go in allocation group 
1, and any files in /mnt/b go in allocation group 2.  Am I right so far?

Then you create directories /mnt/a/a1 and /mnt/a/a2.  Do these also go 
in allocation group 1, or do they go in groups 3 and 4?  Similarly, do 
files inside them go in group 1 or in groups 3 and 4?

To take an example that is quite relevant to me, consider a mail server 
handling two domains.  You have (for example) /var/mail/domain1 and 
/var/mail/domain2, with each user having a directory within either 
domain1 or domain2.  What I would like to know, is if the xfs filesystem 
is mounted on /var/mail, then are the user directories spread across the 
allocation groups, or are all of domain1 users in one group and all of 
domain2 users in another group?  If it is the former, then xfs on a 
linear concat would scale beautifully - if it is the later, then it 
would be pretty terrible scaling.

Also note that a linear concat will only give increased performance with
XFS, again for appropriate worklods. Using a linear concat with EXT3/4
will give you the performance of a single spindle regardless of the
total number of disks used. So one should stick with striped arrays for
EXT3/4.

I understand this, which is why I didn't comment earlier.  I am aware 
that only XFS can utilise the parts of a linear concat to improve 
performance - my questions were about the circumstances in which XFS can 
utilise the multiple allocation groups.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html