Re: high throughput storage server?

Robin Hill <robin@xxxxxxxxxxxxxxx> · Tue, 22 Mar 2011 09:46:58 +0000



On Mon Mar 21, 2011 at 11:13:04 +0100, Keld Jørn Simonsen wrote:

> On Mon, Mar 21, 2011 at 09:18:57AM -0500, Stan Hoeppner wrote:
> > 
> > > Anyway, with 384 spindles and only 50 users, each user will have in
> > > average 7 spindles for himself. I think much of the time this would mean 
> > > no random IO, as most users are doing large sequential reading. 
> > > Thus on average you can expect quite close to striping speed if you
> > > are running RAID capable of striping. 
> > 
> > This is not how large scale shared RAID storage works under a
> > multi-stream workload.  I thought I explained this in sufficient detail.
> >  Maybe not.
> 
> Given that the whole array system is only lightly loaded, this is how I
> expect it to function. Maybe you can explain why it would not be so, if
> you think otherwise.
> 
If you have more than one system accessing the array simultaneously then
your sequential IO immediately becomes random (as it'll interleave the
requests from the multiple systems). The more systems accessing
simultaneously, the more random the IO becomes. Of course, there will
still be an opportunity for some readahead, so it's not entirely random
IO.

> it is probably not the concurrency of XFS that makes the parallelism of
> the IO. It is more likely the IO system, and that would also work for
> other file system types, like ext4. I do not see anything in the XFS allocation
> blocks with any knowledge of the underlying disk structure. 
> What the file system does is only to administer the scheduling of the
> IO, in combination with the rest of the kernel.
> 
XFS allows for splitting the single filesystem into multiple allocation
groups. It can then allocate blocks from each group simultaneously
without worrying about collisions. If the allocation groups are on
separate physical spindles then (apart from the initial mapping of a
request to an allocation group, which should be a very quick operation),
the entire write process is parallelised.  Most filesystems have only a
single allocation group, so the block allocation is single threaded and
can easily become a bottleneck. It's only once the blocks are allocated
(assuming the filesystem knows about the physical layout) that the
writes can be parallelised. I've not looked into the details of ext4
though, so I don't know whether it makes any moves towards parallelising
block allocation.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@xxxxxxxxxxxxxxx> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |
Attachment:
pgpIRR9Dx8ovW.pgp

Description: PGP signature