Hello everybody,
Before everything else I wanted to say thanks to Jens and everybody who
has put their time in this great project :)
I recently started using fio for benchmarking hard disks and possible
storage setups and also learn more about the workings of the layers
involved and now ran into a question.
I'm sorry if I'm missing something here - my knowledge of the topic is
certainly lacking... However, any help is appreciated.
On an ext4 file system two files cannot share the same block - so
reading or writing a lot of small files introduces a considerable
overhead wrt the actual amount of data retrieved from disk.
I want to take this into account when modelling workloads with fio, but
issuing a command like
fio --name test --rw=randread --blocksize=4096 --size=1m --nrfiles=100
---directory=path/on/ext4-formated/hard-disk/
will make fio read only 800KiB or 200 blocks. Each file occupies roughly
2,5 blocks, so partially filled blocks occupied by each file are just
ignored.
Looking into a possible solution i figured I could issue the parameter
bssplit=2048:n, 4096:n-1
with
n = 1 / ceil[(file size) / (block size)]
to simulate the overhead for reads/writes of partially occupied blocks
and get a meaningful throughput number for this kind of scenario
(assuming that in reality, (file size) mod (block size) is uniformly
distributed).
Does this make sense? Are there maybe better solutions?
Thanks
gw