Re: make filesystem failed while the capacity of raid5 is big than 16TB

David Brown <david.brown@xxxxxxxxxxxx> · Wed, 19 Sep 2012 09:20:13 +0200

On 18/09/2012 23:38, Stan Hoeppner wrote:
On 9/18/2012 5:22 AM, David Brown wrote:
On 18/09/2012 11:35, GuoZhong Han wrote:
Hi Stan:
          Thanks for your advice. In your last mail, you mentioned XFS
file system. According to your suggestion, I changed the file system
from raid5 (4*2T, chunksize: 128K, strip_catch_size:2048) to XFS. Then
I did a write performance test on XFS.
The test was as follows:
          My program used 4 threads to do parallel writing to 30 files
with 1MB/s writing speed on each file. Each thread was bound on a
single core. The estimated total speed should be stable at 30MB/s. I
recorded the total writing speed every second in the test. Compared
with speed of ext4, when the array was going to be full, the
performance of XFS has indeed increased. The time to create the XFS
file system was much less than the cost of ext4. However, I found that
the total speed wasn’t steady. Although most of time the speed can
reach to 30M/s, it fell to only about 10MB/s in rare cases. Writing to
30 files in parallel was supposed to be easy. Why did this happen?

Two questions - what is the XFS built on? 4 x 2TB in a linear
concatenation, or something else?

According to the above it's a 4 drive RAID5.

He wrote "I changed the file system from raid5 (4 x 2T) to XFS", so I am 
looking for clarification here.

Secondly, are all your files in the same directory, or in different
directories?  XFS scales by using multiple threads for different
allocation groups,

This is partially correct if he's using the inode64 allocator.  Do note
multiple XFS write threads can target the same AG and get parallel
performance.

I didn't know that - there is always something new to learn!

However, I don't think that should make a huge difference - after all, 
the work done by these threads is going to be fairly small until you 
actually get to writing out the data to the AG.  Latency for the 
application might be reduced a little, but disk throughput will not 
benefit much.

What you are referring to above is writing to multiple AGs
in parallel, where each AG resides on a different member device of a
concatenation.

Yes, although I know that each AG does not necessarily reside on a 
different member device.

As far as I see it now, there are three stages -

1. write threads (can be several per AG)
2. AGs (can be several per disk)
3. Disks (members of a linear concat)

Writing to say 16 AGs in parallel where all reside on the same disk
array will actually decrease performance compared to 16 writes to one AG
on that array.  The reason is the latter causes far less head travel
between writes.

Yes.

and putting these groups in different places on the
underlying disk or disks - but files in the same directory go in the
same allocation group.  So 30 files in 30 directories will give much
more parallelism than 30 files in 1 directory.

Actually, no.  The level of parallelism is the same--30 concurrent
writes.  As noted above, the increase in performance comes from locating
each of the AGs on a different disk, or array.  This decreases the
number of seeks requires per write, especially with parity arrays.

OK, so you get 30 parallel logical writes, but if it does not translate 
into multiple parallel physical writes to the disks by having multiple 
member disks, then the gains are small.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html