Re: high throughput storage server?

Keld Jørn Simonsen <keld@xxxxxxxxxx> · Tue, 22 Mar 2011 11:14:03 +0100

On Tue, Mar 22, 2011 at 09:46:58AM +0000, Robin Hill wrote:
> On Mon Mar 21, 2011 at 11:13:04 +0100, Keld Jørn Simonsen wrote:
> 
> > On Mon, Mar 21, 2011 at 09:18:57AM -0500, Stan Hoeppner wrote:
> > > 
> > > > Anyway, with 384 spindles and only 50 users, each user will have in
> > > > average 7 spindles for himself. I think much of the time this would mean 
> > > > no random IO, as most users are doing large sequential reading. 
> > > > Thus on average you can expect quite close to striping speed if you
> > > > are running RAID capable of striping. 
> > > 
> > > This is not how large scale shared RAID storage works under a
> > > multi-stream workload.  I thought I explained this in sufficient detail.
> > >  Maybe not.
> > 
> > Given that the whole array system is only lightly loaded, this is how I
> > expect it to function. Maybe you can explain why it would not be so, if
> > you think otherwise.
> > 
> If you have more than one system accessing the array simultaneously then
> your sequential IO immediately becomes random (as it'll interleave the
> requests from the multiple systems). The more systems accessing
> simultaneously, the more random the IO becomes. Of course, there will
> still be an opportunity for some readahead, so it's not entirely random
> IO.

Of course the IO will be randomized, if there is more users, but the
read IO will tend to be quite sequential, if the reading of each process
is sequential. So if a user reads a big file sequentially, and the
system is lightly loaded, IO schedulers will tend to order all IO
for the process so that it is served in one series of operations,
given that the big file is laid out consequently on the file system.

> > it is probably not the concurrency of XFS that makes the parallelism of
> > the IO. It is more likely the IO system, and that would also work for
> > other file system types, like ext4. I do not see anything in the XFS allocation
> > blocks with any knowledge of the underlying disk structure. 
> > What the file system does is only to administer the scheduling of the
> > IO, in combination with the rest of the kernel.

> XFS allows for splitting the single filesystem into multiple allocation
> groups. It can then allocate blocks from each group simultaneously
> without worrying about collisions. If the allocation groups are on
> separate physical spindles then (apart from the initial mapping of a
> request to an allocation group, which should be a very quick operation),
> the entire write process is parallelised.  Most filesystems have only a
> single allocation group, so the block allocation is single threaded and
> can easily become a bottleneck. It's only once the blocks are allocated
> (assuming the filesystem knows about the physical layout) that the
> writes can be parallelised. I've not looked into the details of ext4
> though, so I don't know whether it makes any moves towards parallelising
> block allocation.

The block allocation is only done when writing. The system at hand was
specified as a mostly reading system, where such a bottleneck of block
allocating is not so dominant.

Best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html