Re: high throughput storage server?

Roberto Spadim <roberto@xxxxxxxxxxxxx> · Wed, 23 Mar 2011 12:57:43 -0300

it's something like 'partitioning'? i don't know xfs very well, but ...
if you use 99% ag16 and 1% ag1-15
you should use a raid0 with stripe (for better write/read rate),
linear wouldn't help like stripe, i'm right?

a question... this example was with directories, how files (metadata)
are saved? and how file content are saved? and jornaling?

i see a filesystem something like: read/write
jornaling(metadata/files), read/write metadata, read/write file
content, check/repair filesystem, features (backup, snapshot, garbage
collection, raid1, increase/decrease fs size, others)

speed of write and read will be a function of how you designed it to
use device layer (it's something like a virtual memory utilization, a
big memory, and many programs trying to use small parts and when need
use a big part)

2011/3/23 Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>:
> Keld Jørn Simonsen put forth on 3/22/2011 5:14 AM:
>
>> Of course the IO will be randomized, if there is more users, but the
>> read IO will tend to be quite sequential, if the reading of each process
>> is sequential. So if a user reads a big file sequentially, and the
>> system is lightly loaded, IO schedulers will tend to order all IO
>> for the process so that it is served in one series of operations,
>> given that the big file is laid out consequently on the file system.
>
> With the way I've architected this hypothetical system, the read load on
> each allocation group (each 12 spindle array) should be relatively low,
> about 3 streams on 14 AGs, 4 streams on the remaining two AGs,
> _assuming_ the files being read are spread out evenly across at least 16
> directories.  As you all read in the docs for which I provided links,
> XFS AG parallelism functions at the directory and file level.  For
> example, if we create 32 directories on a virgin XFS filesystem of 16
> allocation groups, the following layout would result:
>
> AG1:  /general requirements     AG1:  /alabama
> AG2:  /site construction        AG2:  /alaska
> AG3:  /concrete                 AG3:  /arizona
> ..
> ..
> AG14: /conveying systems        AG14: /indiana
> AG15: /mechanical               AG15: /iowa
> AG16: /electrical               AG16: /kansas
>
> AIUI, the first 16 directories get created in consecutive AGs until we
> hit the last AG.  The 17th directory is then created in the first AG and
> we start the cycle over.  This is how XFS allocation group parallelism
> works.  It doesn't provide linear IO scaling for all workloads, and it's
> not magic, but it works especially well for multiuser fileservers, and
> typically better than multi nested stripe levels or extremely wide arrays.
>
> Imagine you have a 5000 seat company.  You'd mount this XFS filesytem in
> /home.  Each user home directory created would fall in a consecutive AG,
> resulting in about 312 user dirs per AG.  In this type of environment
> XFS AG parallelism will work marvelously as you'll achieve fairly
> balanced IO across all AGs and thus all 16 arrays.
>
> In the case where you have many clients reading files from only one
> directory, hence the same AG, IO parallelism is limited to the 12
> spindles of that one array.  When this happens, we end up with a highly
> random workload at the disk head, resulting in high seek rates and low
> throughput.  This is one of the reasons I built some "excess" capacity
> into the disk subsystem.  Using XFS AGs for parallelism doesn't
> guarantee even distribution of IO across all the 192 spindles of the 16
> arrays.  It gives good parallelism if clients are accessing different
> files in different directories concurrently, but not in the opposite case.
>
>> The block allocation is only done when writing. The system at hand was
>> specified as a mostly reading system, where such a bottleneck of block
>> allocating is not so dominant.
>
> This system would excel at massive parallel writes as well, again, as
> long as we have many writers into multiple directories concurrently,
> which spreads the write load across all AGs, and thus all arrays.
>
> XFS is legendary for multiple large file parallel write throughput,
> thanks to delayed allocation, and some other tricks.
>
> --
> Stan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html