Re: Requesting help with random I/O distribution

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/8/23 21:11, Surbhi Palande wrote:
Hi All,

I am trying to performance test my device mapper for zoned devices; I
am trying to repliate the 80 - 20 principle for I/O. I understand that
this can be done in the following three ways using fio.

a) zoned - simplest of the three :
zoned:80/20:20/80
However, this restricts the first 20% space to get 80% of the I/O and
vice versa. The good thing though is that zoned distribution can be
used to achieve a russian doll effect.

b) zipf:1.2

I used fio-genzipf to visualize the random I/O pattern:
fio-genzipf -t zipf -i 1.2 -b 4096 -g 100GiB
Generating Zipf distribution with 1.200000 input and 100 GiB size and
4096 block_size.

    Rows           Hits %         Sum %           # Hits          Size
-----------------------------------------------------------------------
Top   5.00% 93.31% 93.31% 24459924 93.31G
|->  10.00%  1.34% 94.65%  352314  1.34G
|->  15.00%  0.77% 95.42%  201010 785.20M
|->  20.00%  0.51% 95.92%  132667 518.23M
|->  25.00%  0.47% 96.39%  122386 478.07M
|->  30.00%  0.34% 96.73%   89402 349.23M
|->  35.00%  0.23% 96.97%   61193 239.04M
|->  40.00%  0.23% 97.20%   61193 239.04M
|->  45.00%  0.23% 97.43%   61193 239.04M
|->  50.00%  0.23% 97.67%   61193 239.04M
|->  55.00%  0.23% 97.90%   61193 239.04M
|->  60.00%  0.23% 98.13%   61193 239.04M
|->  65.00%  0.23% 98.37%   61193 239.04M
|->  70.00%  0.23% 98.60%   61193 239.04M
|->  75.00%  0.23% 98.83%   61193 239.04M
|->  80.00%  0.23% 99.07%   61193 239.04M
|->  85.00%  0.23% 99.30%   61193 239.04M
|->  90.00%  0.23% 99.53%   61193 239.04M
|->  95.00%  0.23% 99.77%   61193 239.04M
|-> 100.00%  0.23% 100.00%   61188 239.02M
-----------------------------------------------------------------------
Total 26214400

I need help with this interpretation. Does this mean that 5% of the
LBAs get 93.31% hits, the next 5% gets 1.34% etc. It seems that way to
me.

I haven't thoroughly digested the source code but all indications suggest that this is the correct interpretation.

However, this does not have the Russian doll effect  - ie the 5% of
the rest of 95% does not get the rest of ~93% I/O.

Add "-o 40" or "-o 100" to increase the number of rows. That way you can see the distribution within each 5% band. The results suggest to me that the distribution is skewed even within each band. If I understand correctly what you mean by "Russian doll effect," this distribution does follow that pattern to some extent, although the distribution is essentially flat in its tail which is inconsistent with that pattern.

Look at the zipf probability density function listed on Wikipedia and imagine its shape after you have removed the most frequent values. Even if you use a new normalizing constant, it won't have the same shape as the original distribution because the distance between, for example, 1/2 and 1/3 will not be the same as the distance between 1/200 and 1/201

Is the 5% range - scattered over the disk or is this similar to zoned
distribution, in that a contiguous
space gets the 93% I/O. In that case, this is similar to zoned
distribution, right?

Try running fio and examining the offsets it produces. Then use some utilities to extract the offsets and analyze them:

$ fio --name=test --ioengine=null --filesize=10240 --bs=512 --rw=randread --randrepeat=0 --random_distribution=zipf:1.2 --debug=io | grep complete: | cut -d ':' -f3 | cut -d ',' -f1 | cut -d '=' -f2 | sort | uniq -c | sort -r
      7 0x1200
      5 0x0
      3 0xc00
      2 0x1c00
      1 0x600
      1 0x2200
      1 0x1600

In each row, the first number is the count and the second number is the offset. Thus you can see that random_distribution=zipf produces offsets all over the map which is different from zoned.

c)  pareto -
fio-genzipf -t pareto -i 0.04 -b 4096 -g 100GiB
Generating Pareto distribution with 0.040000 input and 100 GiB size
and 4096 block_size.

    Rows           Hits %         Sum %           # Hits          Size
-----------------------------------------------------------------------
Top   5.00% 93.04% 93.04% 24388831 93.04G
|->  10.00%  0.99% 94.02%  259143 1012.28M
|->  15.00%  0.60% 94.63%  158285 618.30M
|->  20.00%  0.59% 95.22%  154826 604.79M
|->  25.00%  0.35% 95.57%   92138 359.91M
|->  30.00%  0.30% 95.87%   77413 302.39M
|->  35.00%  0.30% 96.16%   77413 302.39M
|->  40.00%  0.30% 96.46%   77413 302.39M
|->  45.00%  0.30% 96.75%   77413 302.39M
|->  50.00%  0.30% 97.05%   77413 302.39M
|->  55.00%  0.30% 97.34%   77413 302.39M
|->  60.00%  0.30% 97.64%   77413 302.39M
|->  65.00%  0.30% 97.93%   77413 302.39M
|->  70.00%  0.30% 98.23%   77413 302.39M
|->  75.00%  0.30% 98.52%   77413 302.39M
|->  80.00%  0.30% 98.82%   77413 302.39M
|->  85.00%  0.30% 99.11%   77413 302.39M
|->  90.00%  0.30% 99.41%   77413 302.39M
|->  95.00%  0.30% 99.70%   77413 302.39M
|-> 100.00%  0.30% 100.00%   77395 302.32M
-----------------------------------------------------------------------
Total 26214400

This to me looks pretty much similar to the zipf distribution above.

Is this understanding correct ? Or am I missing something here?

There are plenty of discussions online about the relationship between the zipf and pareto distributions. I don't have any particular expertise to add to what you can already easily find.

Vincent



[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux