Accounting for partially filled blocks when modeling workloads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello everybody,

Before everything else I wanted to say thanks to Jens and everybody who has put their time in this great project :)

I recently started using fio for benchmarking hard disks and possible storage setups and also learn more about the workings of the layers involved and now ran into a question. I'm sorry if I'm missing something here - my knowledge of the topic is certainly lacking... However, any help is appreciated.

On an ext4 file system two files cannot share the same block - so reading or writing a lot of small files introduces a considerable overhead wrt the actual amount of data retrieved from disk. I want to take this into account when modelling workloads with fio, but issuing a command like

fio --name test --rw=randread --blocksize=4096 --size=1m --nrfiles=100 ---directory=path/on/ext4-formated/hard-disk/

will make fio read only 800KiB or 200 blocks. Each file occupies roughly 2,5 blocks, so partially filled blocks occupied by each file are just ignored.


Looking into a possible solution i figured I could issue the parameter

bssplit=2048:n, 4096:n-1

with

n = 1 /  ceil[(file size) / (block size)]

to simulate the overhead for reads/writes of partially occupied blocks and get a meaningful throughput number for this kind of scenario (assuming that in reality, (file size) mod (block size) is uniformly distributed).


Does this make sense? Are there maybe better solutions?

Thanks
gw



[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux