>> [ ... ] supposed to hold the object storage layer of a BeeFS >> highly parallel filesystem, and therefore will likely have >> mostly-random accesses. > Where do you get the assumption from that FhGFS/BeeGFS is > going to do random reads/writes or the application of top of > it is going to do that? In this specific case it is not an assumption, thanks to the prominent fact that the original poster was testing (locally I guess) and complaining about concurrent read/writes, which result in random like arm movement even if each of the read and write streams are entirely sequential. I even pointed this out, probably not explicitly enough: >> when doing only reading / only writing , the speed is very >> fast(~1.5G), but when do both the speed is very slow >> (100M), and high r_await(160) and w_await(200000). BTW the 100MB/s aggregate over 31 drives means around 3MB/s per drive, which seems pretty good for a RW workload with mostly-random accesses with high RMW correlation. Also if this testing was appropriate then it was because the intended workload was indeed concurrent reads and writes to the object store. It is not a mere assumption in the general case either; it is both commonly observed and a simple deduction, because of the nature of distributed filesystems and in particular parallel HPC ones like Lustre or BeeGFS, but also AFS and even NFS ones. * Clients have caches. Therefore most of the locality in the (read) access patterns will hopefully be filtered out by the client cache. This applies (ideally) to any distributed filesystem. * HPC/parallel servers tend to whave many clients (e.g. for an it could be 10,000 clients and 500 object storage servers) and hopefully each client works on a different subset of the data tree, and distribution of data objects onto servers hopefully random. Therefore it is likely that many clients will access with concurrent read and write many different files on the same server resulting in many random "hotspots" in each server's load. Note that each client could be doing entirely sequential IO to each file they access, but the concurrent accesses do possibly widely scattered files will turn that into random IO at the server level. Just about the only case where sequential client workloads don't become random workloads at the server is when the client workload is such that only one file is "hot" per server. There is an additional issue favouring random access patterns: * Typically large fileservers are setup with a lot of storage because of anticipated lifetime usage, so they start mostly empty. * Most filesystems then allocate new data in regular patterns, often starting from the beginning of available storage, in an attempt to minimize arm travel time usually (XFS uses various heuristics, which are somewhat different whether the option 'inode64' is specified or not). * Unfortunately as the filetree becomes larger new allocations have to be made farther away, resulting in longer travel times and more apparent randomness at the storage server level. * Eventually if the object server reaches a steady state where roughly as much data is deleted and created the free storage areas will become widely scattered, leading to essentially random allocation, the more random the more capacity used. Leaving a significant percentage of capacity free, like at least 10% and more like 20%, greatly increases the chance of finding free space near to put new data near to existing "related" data. This increases locality, but only at the single-stream level; therefore is usually does not help that much widely shared distributed servers; and in particular does not apply that much to object stores, because usually they obscure which data object is related to which data object. The above issues are pretty much "network and distributed filesystems for beginners" notes, but in significant part also apply to widely shared non network and non distributed servers on which XFS is often used, so they may be usefully mentioned in this list. _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs