Hello everyone, I am trying to emulate a real I/O workload with multiple fio jobs. And I noticed that switching between jobs takes around ~300 ms when using the stonewall parameter and ~200 ms when using the wait_for parameter. This delay seems to be rather high from my point of view. Is this the expected behavior? Is there any possibility to reduce the delays between jobs? This delays you could see if you reproduce fio run with a python script, which I supplement to the message. This script is intended to get delays with IO operations It uses blktrace for getting IO traces and then parses them and generates a CSV file with delays between IOs. Because blktrace started before fio starts, it also collects some number of IO operations that are not related to the fio job file. To separate these operations in IO trace output, I added a small separator job at the beginning of the job file which does a single 128k operation and waits for 1 second, which could be easily noticed at the IO trace output. Blkparse parse then all IO trace binary files and IO traces filtered by the D (issued) only operations and then filtered by fio process. It's should work fine with the OS disk, but will work better, if to make a test against a separated device. Depends on your setup, IO trace operations could not appear. In this case, you could try to use the USB stick to get the script working. This script is required to run with root rights because blktrace requires to have root rights to be able to collect all IO traces. Fio version /usr/local/bin/fio --version fio-3.26-39-g6308 Linux version Linux dmt-ws 5.8.0-48-generic #54~20.04.1-Ubuntu SMP Sat Mar 20 13:40:25 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux Fio job file [global] direct=1 group_reporting ioengine=libaio directory=/media/optane/fio_temp numjobs=1 iodepth=1 [separator job 0] # used for understanding where in io traces starts job file rw=read size=128k bssplit=128k,, thinktime=1s [reading job 1] wait_for=separator job 0 rw=read size=4096b bssplit=4k,, [reading job 2] wait_for=reading job 1 rw=read size=20480b bssplit=4k,, [reading job 3] wait_for=reading job 2 rw=read size=20480b bssplit=4k,, [reading job 4] wait_for=reading job 3 rw=read size=20480b bssplit=4k,, [reading job 5] stonewall rw=read size=20480b bssplit=4k,, [reading job 6] stonewall rw=read size=16384b bssplit=4k,, My output with delays of read_delta.csv Header includes: Timestamp (nanoseconds), operation from blktrace, size of IO in bytes, number of CPU (core), name of the process, a calculated value from previous IO (nanoseconds). Most important values, which I am concerning I highlighted below. These are numbers represents the delay, which required for fio to switch to the next job. ;timestamp;operation;io_size;cpu_id;process;delta 0;0.047131672;D;4096;2;fio;0.021224560 1;0.068356232;D;135168;2;fio;0.004784061 2;0.073140293;D;135168;2;fio;0.269024046 3;0.342164339;D;131072;11;fio;1.205111417 <- Job file started here 4;1.547275756;D;4096;10;fio;**0.201172283** 5;1.748448039;D;4096;11;fio;0.000361312 6;1.748809351;D;4096;11;fio;0.000271772 7;1.749081123;D;4096;11;fio;0.000261770 8;1.749342893;D;4096;11;fio;0.000256501 9;1.749599394;D;4096;11;fio;**0.200772160** 10;1.950371554;D;4096;11;fio;0.000259826 11;1.950631380;D;4096;11;fio;0.000158999 12;1.950790379;D;4096;11;fio;0.000163855 13;1.950954234;D;4096;11;fio;0.000189879 14;1.951144113;D;4096;11;fio;**0.199244653** 15;2.150388766;D;4096;4;fio;0.000203573 16;2.150592339;D;4096;4;fio;0.000150124 17;2.150742463;D;4096;4;fio;0.000146229 18;2.150888692;D;4096;4;fio;0.000184647 19;2.151073339;D;4096;4;fio;**0.300860646** 20;2.451933985;D;4096;10;fio;0.001125194 21;2.453059179;D;4096;10;fio;0.000649955 22;2.453709134;D;4096;10;fio;0.009264417 23;2.462973551;D;4096;10;fio;0.008190598 24;2.471164149;D;4096;10;fio;**0.282772655** 25;2.753936804;D;4096;10;fio;0.000448414 26;2.754385218;D;4096;10;fio;0.000282554 27;2.754667772;D;4096;10;fio;0.000258501 28;2.754926273;D;4096;10;fio; Python code to generate the result ''' Script intended to get delays of IO operations. Author: Azat Nurgaliev, mailto:azat.nurgaliev@xxxxxx Created Date: May 4 2021 ''' import pandas as pd import subprocess, os from pathlib import Path import time # define the device, from which will be collected IO traces device = "nvme1n1" # creating a folder with output files Path("blktrace").mkdir(parents=True, exist_ok=True) # clearing system caches subprocess.run(["sh", "-c", 'echo 3 >/proc/sys/vm/drop_caches']) # submitting blktrace process requires root rights blktrace = subprocess.Popen(["blktrace", "-d", "/dev/" + device, "-o", device, "-D", "blktrace"], preexec_fn=os.setpgrp, stdout=subprocess.DEVNULL) print("blktrace spawned with PID: %s" % blktrace.pid) # starting the fio process fio = subprocess.Popen(["fio", "delay_job.fio"], stdout=subprocess.PIPE) print("fio spawned with PID: %s" % fio.pid) outs, errs = fio.communicate(timeout=100) print("fio finished") time.sleep(10) blktrace.terminate() # running blkparse and filtering required values blkparse_read_throughput = "blkparse -i blktrace/" + device + ".blktrace. -f '%5T.%t %a %N %c %C\\n' -a read -o blktrace/" + device + "_read_raw.csv" blkparse_read_throughput_filtered_operation = "grep ' fio' blktrace/" + device + "_read_raw.csv > blktrace/" + device + "_read_filtered_process.csv" blkparse_read_throughput_filtered_process = "grep ' D ' blktrace/" + device + "_read_filtered_process.csv > blktrace/" + device + "_read_filtered_operation.csv" subprocess.call(blkparse_read_throughput, shell=True, stdout=subprocess.DEVNULL) subprocess.call(blkparse_read_throughput_filtered_operation, shell=True) subprocess.call(blkparse_read_throughput_filtered_process, shell=True) data_read = pd.read_csv("blktrace/" + device + "_read_filtered_operation.csv", delim_whitespace=True, names=['timestamp', 'operation', 'io_size', 'cpu_id', 'process']) data_read['delta'] = data_read.timestamp.diff().shift(-1) #print(data_read) data_read.to_csv("blktrace/" + device + "_read_delta.csv", sep=";", encoding="utf-8", float_format="%.9f") How to run the script: . Create a folder for fio temp files. Better on the separated device . Change parameter directory in the fio job file with pointing the fio temp folder . Specify correct device in the python script . Install required python dependencies, if needed . Put the fio job file in the same directory as the python script. Name it delay_job.fio . Run python script with root rights . The result will be added to the blktrace folder, which is created by the script in the same folder as the script Best regards, Azat Nurgaliev