it's something like 'partitioning'? i don't know xfs very well, but ... if you use 99% ag16 and 1% ag1-15 you should use a raid0 with stripe (for better write/read rate), linear wouldn't help like stripe, i'm right? a question... this example was with directories, how files (metadata) are saved? and how file content are saved? and jornaling? i see a filesystem something like: read/write jornaling(metadata/files), read/write metadata, read/write file content, check/repair filesystem, features (backup, snapshot, garbage collection, raid1, increase/decrease fs size, others) speed of write and read will be a function of how you designed it to use device layer (it's something like a virtual memory utilization, a big memory, and many programs trying to use small parts and when need use a big part) 2011/3/23 Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>: > Keld Jørn Simonsen put forth on 3/22/2011 5:14 AM: > >> Of course the IO will be randomized, if there is more users, but the >> read IO will tend to be quite sequential, if the reading of each process >> is sequential. So if a user reads a big file sequentially, and the >> system is lightly loaded, IO schedulers will tend to order all IO >> for the process so that it is served in one series of operations, >> given that the big file is laid out consequently on the file system. > > With the way I've architected this hypothetical system, the read load on > each allocation group (each 12 spindle array) should be relatively low, > about 3 streams on 14 AGs, 4 streams on the remaining two AGs, > _assuming_ the files being read are spread out evenly across at least 16 > directories. As you all read in the docs for which I provided links, > XFS AG parallelism functions at the directory and file level. For > example, if we create 32 directories on a virgin XFS filesystem of 16 > allocation groups, the following layout would result: > > AG1: /general requirements AG1: /alabama > AG2: /site construction AG2: /alaska > AG3: /concrete AG3: /arizona > .. > .. > AG14: /conveying systems AG14: /indiana > AG15: /mechanical AG15: /iowa > AG16: /electrical AG16: /kansas > > AIUI, the first 16 directories get created in consecutive AGs until we > hit the last AG. The 17th directory is then created in the first AG and > we start the cycle over. This is how XFS allocation group parallelism > works. It doesn't provide linear IO scaling for all workloads, and it's > not magic, but it works especially well for multiuser fileservers, and > typically better than multi nested stripe levels or extremely wide arrays. > > Imagine you have a 5000 seat company. You'd mount this XFS filesytem in > /home. Each user home directory created would fall in a consecutive AG, > resulting in about 312 user dirs per AG. In this type of environment > XFS AG parallelism will work marvelously as you'll achieve fairly > balanced IO across all AGs and thus all 16 arrays. > > In the case where you have many clients reading files from only one > directory, hence the same AG, IO parallelism is limited to the 12 > spindles of that one array. When this happens, we end up with a highly > random workload at the disk head, resulting in high seek rates and low > throughput. This is one of the reasons I built some "excess" capacity > into the disk subsystem. Using XFS AGs for parallelism doesn't > guarantee even distribution of IO across all the 192 spindles of the 16 > arrays. It gives good parallelism if clients are accessing different > files in different directories concurrently, but not in the opposite case. > >> The block allocation is only done when writing. The system at hand was >> specified as a mostly reading system, where such a bottleneck of block >> allocating is not so dominant. > > This system would excel at massive parallel writes as well, again, as > long as we have many writers into multiple directories concurrently, > which spreads the write load across all AGs, and thus all arrays. > > XFS is legendary for multiple large file parallel write throughput, > thanks to delayed allocation, and some other tricks. > > -- > Stan > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html