Keld Jørn Simonsen put forth on 3/22/2011 5:14 AM: > Of course the IO will be randomized, if there is more users, but the > read IO will tend to be quite sequential, if the reading of each process > is sequential. So if a user reads a big file sequentially, and the > system is lightly loaded, IO schedulers will tend to order all IO > for the process so that it is served in one series of operations, > given that the big file is laid out consequently on the file system. With the way I've architected this hypothetical system, the read load on each allocation group (each 12 spindle array) should be relatively low, about 3 streams on 14 AGs, 4 streams on the remaining two AGs, _assuming_ the files being read are spread out evenly across at least 16 directories. As you all read in the docs for which I provided links, XFS AG parallelism functions at the directory and file level. For example, if we create 32 directories on a virgin XFS filesystem of 16 allocation groups, the following layout would result: AG1: /general requirements AG1: /alabama AG2: /site construction AG2: /alaska AG3: /concrete AG3: /arizona .. .. AG14: /conveying systems AG14: /indiana AG15: /mechanical AG15: /iowa AG16: /electrical AG16: /kansas AIUI, the first 16 directories get created in consecutive AGs until we hit the last AG. The 17th directory is then created in the first AG and we start the cycle over. This is how XFS allocation group parallelism works. It doesn't provide linear IO scaling for all workloads, and it's not magic, but it works especially well for multiuser fileservers, and typically better than multi nested stripe levels or extremely wide arrays. Imagine you have a 5000 seat company. You'd mount this XFS filesytem in /home. Each user home directory created would fall in a consecutive AG, resulting in about 312 user dirs per AG. In this type of environment XFS AG parallelism will work marvelously as you'll achieve fairly balanced IO across all AGs and thus all 16 arrays. In the case where you have many clients reading files from only one directory, hence the same AG, IO parallelism is limited to the 12 spindles of that one array. When this happens, we end up with a highly random workload at the disk head, resulting in high seek rates and low throughput. This is one of the reasons I built some "excess" capacity into the disk subsystem. Using XFS AGs for parallelism doesn't guarantee even distribution of IO across all the 192 spindles of the 16 arrays. It gives good parallelism if clients are accessing different files in different directories concurrently, but not in the opposite case. > The block allocation is only done when writing. The system at hand was > specified as a mostly reading system, where such a bottleneck of block > allocating is not so dominant. This system would excel at massive parallel writes as well, again, as long as we have many writers into multiple directories concurrently, which spreads the write load across all AGs, and thus all arrays. XFS is legendary for multiple large file parallel write throughput, thanks to delayed allocation, and some other tricks. -- Stan -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html