> -----Original Message----- > From: fio-owner@xxxxxxxxxxxxxxx [mailto:fio-owner@xxxxxxxxxxxxxxx] On > Behalf Of Jens Axboe > Sent: Wednesday, 07 January, 2015 5:33 PM > To: fio@xxxxxxxxxxxxxxx > Subject: Non-uniform randomness with drifting > > Hi, > > If you boil it down, fio can basically do two types of random > distributions (random_distribution=): > > - Uniform, meaning we scatter evenly across the IO range. > - Or zipf/pareto, meaning that we have some notion of hotness of > offsets that are hit more often than others. > > zipf/pareto are often used to simulate real world access patterns, > where, eg, 5% of the dataset is hit 95% of the time, and having a long > tail of rarely accessed data. > There are two dimensions of locality: spatial and temporal. random_distribution= provides some spatial locality control. The zipf distribution is not as "bursty" as many application workloads. There is no direct temporal locality control. rate= sets a maximum rate, which essentially inserts some idle time (but the same for the entire job). A 2002 paper by Mengzhi Wang, et al. at CMU discussed the importance of spatio-temporal correlation, at least in a storage trace provided by HP Labs. They found "the intuition that requests arriving closely in time tend to access nearby objects" held true and proposed a model that incorporated that concept called PQRS. 2002 slides: http://www-cgi.cs.cmu.edu/~mzwang/research/pub/pqrs_slides.pdf 2002 paper: http://pdl.cmu.edu/PDL-FTP/Workload/performance02.pdf 2005 analysis slides: http://www.research.ibm.com/haifa/Workshops/systems-and-storage2005/papers/eitan_bachmat_pqrs.pdf 2005 analysis paper: http://www.cs.bgu.ac.il/~ebachmat/pqrs-14-4-2012.pdf The new random_drift feature changes spatial locality over time (where time means % of data that has been accessed), so it provides some correlation control, but doesn't provide full temporal locality control. Maybe it'd be worth trying to implement the PQRS model and see if that works well for both dimensions? > I'm thinking that random_drift_percentage should be split in two, so > that we could say "shift X percent every time Y percent of the data has > been accessed". But apart from that, any input on this? I'm open to If running on huge devices with time_based=1, it can take a long time to reach a meaningful % of the total size. % of time might be better. --- Rob Elliott HP Server Storage ��.n��������+%������w��{.n�������^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�