On 6/8/23 21:11, Surbhi Palande wrote:
Hi All,
I am trying to performance test my device mapper for zoned devices; I
am trying to repliate the 80 - 20 principle for I/O. I understand that
this can be done in the following three ways using fio.
a) zoned - simplest of the three :
zoned:80/20:20/80
However, this restricts the first 20% space to get 80% of the I/O and
vice versa. The good thing though is that zoned distribution can be
used to achieve a russian doll effect.
b) zipf:1.2
I used fio-genzipf to visualize the random I/O pattern:
fio-genzipf -t zipf -i 1.2 -b 4096 -g 100GiB
Generating Zipf distribution with 1.200000 input and 100 GiB size and
4096 block_size.
Rows Hits % Sum % # Hits Size
-----------------------------------------------------------------------
Top 5.00% 93.31% 93.31% 24459924 93.31G
|-> 10.00% 1.34% 94.65% 352314 1.34G
|-> 15.00% 0.77% 95.42% 201010 785.20M
|-> 20.00% 0.51% 95.92% 132667 518.23M
|-> 25.00% 0.47% 96.39% 122386 478.07M
|-> 30.00% 0.34% 96.73% 89402 349.23M
|-> 35.00% 0.23% 96.97% 61193 239.04M
|-> 40.00% 0.23% 97.20% 61193 239.04M
|-> 45.00% 0.23% 97.43% 61193 239.04M
|-> 50.00% 0.23% 97.67% 61193 239.04M
|-> 55.00% 0.23% 97.90% 61193 239.04M
|-> 60.00% 0.23% 98.13% 61193 239.04M
|-> 65.00% 0.23% 98.37% 61193 239.04M
|-> 70.00% 0.23% 98.60% 61193 239.04M
|-> 75.00% 0.23% 98.83% 61193 239.04M
|-> 80.00% 0.23% 99.07% 61193 239.04M
|-> 85.00% 0.23% 99.30% 61193 239.04M
|-> 90.00% 0.23% 99.53% 61193 239.04M
|-> 95.00% 0.23% 99.77% 61193 239.04M
|-> 100.00% 0.23% 100.00% 61188 239.02M
-----------------------------------------------------------------------
Total 26214400
I need help with this interpretation. Does this mean that 5% of the
LBAs get 93.31% hits, the next 5% gets 1.34% etc. It seems that way to
me.
I haven't thoroughly digested the source code but all indications
suggest that this is the correct interpretation.
However, this does not have the Russian doll effect - ie the 5% of
the rest of 95% does not get the rest of ~93% I/O.
Add "-o 40" or "-o 100" to increase the number of rows. That way you can
see the distribution within each 5% band. The results suggest to me that
the distribution is skewed even within each band. If I understand
correctly what you mean by "Russian doll effect," this distribution does
follow that pattern to some extent, although the distribution is
essentially flat in its tail which is inconsistent with that pattern.
Look at the zipf probability density function listed on Wikipedia and
imagine its shape after you have removed the most frequent values. Even
if you use a new normalizing constant, it won't have the same shape as
the original distribution because the distance between, for example, 1/2
and 1/3 will not be the same as the distance between 1/200 and 1/201
Is the 5% range - scattered over the disk or is this similar to zoned
distribution, in that a contiguous
space gets the 93% I/O. In that case, this is similar to zoned
distribution, right?
Try running fio and examining the offsets it produces. Then use some
utilities to extract the offsets and analyze them:
$ fio --name=test --ioengine=null --filesize=10240 --bs=512
--rw=randread --randrepeat=0 --random_distribution=zipf:1.2 --debug=io |
grep complete: | cut -d ':' -f3 | cut -d ',' -f1 | cut -d '=' -f2 | sort
| uniq -c | sort -r
7 0x1200
5 0x0
3 0xc00
2 0x1c00
1 0x600
1 0x2200
1 0x1600
In each row, the first number is the count and the second number is the
offset. Thus you can see that random_distribution=zipf produces offsets
all over the map which is different from zoned.
c) pareto -
fio-genzipf -t pareto -i 0.04 -b 4096 -g 100GiB
Generating Pareto distribution with 0.040000 input and 100 GiB size
and 4096 block_size.
Rows Hits % Sum % # Hits Size
-----------------------------------------------------------------------
Top 5.00% 93.04% 93.04% 24388831 93.04G
|-> 10.00% 0.99% 94.02% 259143 1012.28M
|-> 15.00% 0.60% 94.63% 158285 618.30M
|-> 20.00% 0.59% 95.22% 154826 604.79M
|-> 25.00% 0.35% 95.57% 92138 359.91M
|-> 30.00% 0.30% 95.87% 77413 302.39M
|-> 35.00% 0.30% 96.16% 77413 302.39M
|-> 40.00% 0.30% 96.46% 77413 302.39M
|-> 45.00% 0.30% 96.75% 77413 302.39M
|-> 50.00% 0.30% 97.05% 77413 302.39M
|-> 55.00% 0.30% 97.34% 77413 302.39M
|-> 60.00% 0.30% 97.64% 77413 302.39M
|-> 65.00% 0.30% 97.93% 77413 302.39M
|-> 70.00% 0.30% 98.23% 77413 302.39M
|-> 75.00% 0.30% 98.52% 77413 302.39M
|-> 80.00% 0.30% 98.82% 77413 302.39M
|-> 85.00% 0.30% 99.11% 77413 302.39M
|-> 90.00% 0.30% 99.41% 77413 302.39M
|-> 95.00% 0.30% 99.70% 77413 302.39M
|-> 100.00% 0.30% 100.00% 77395 302.32M
-----------------------------------------------------------------------
Total 26214400
This to me looks pretty much similar to the zipf distribution above.
Is this understanding correct ? Or am I missing something here?
There are plenty of discussions online about the relationship between
the zipf and pareto distributions. I don't have any particular expertise
to add to what you can already easily find.
Vincent