Long delay between jobs

<Azat.Nurgaliev@xxxxxx> · Wed, 12 May 2021 15:11:39 +0000

Hello everyone, 

I am trying to emulate a real I/O workload with multiple fio jobs.
And I noticed that switching between jobs takes around ~300 ms when using the stonewall parameter and ~200 ms when using the wait_for parameter. This delay seems to be rather high from my point of view.
Is this the expected behavior? Is there any possibility to reduce the delays between jobs?
This delays you could see if you reproduce fio run with a python script, which I supplement to the message. This script is intended to get delays with IO operations
It uses blktrace for getting IO traces and then parses them and generates a CSV file with delays between IOs.
Because blktrace started before fio starts, it also collects some number of IO operations that are not related to the fio job file. To separate these operations in IO trace output, I added a small separator job at the beginning of the job file which does a single 128k operation and waits for 1 second, which could be easily noticed at the IO trace output.
Blkparse parse then all IO trace binary files and IO traces filtered by the D (issued) only operations and then filtered by fio process. It's should work fine with the OS disk, but will work better, if to make a test against a separated device. Depends on your setup, IO trace operations could not appear. In this case, you could try to use the USB stick to get the script working.
This script is required to run with root rights because blktrace requires to have root rights to be able to collect all IO traces.

Fio version
/usr/local/bin/fio --version                                                            
fio-3.26-39-g6308

Linux version
Linux dmt-ws 5.8.0-48-generic #54~20.04.1-Ubuntu SMP Sat Mar 20 13:40:25 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Fio job file

[global]
direct=1
group_reporting
ioengine=libaio
directory=/media/optane/fio_temp
numjobs=1
iodepth=1

[separator job 0]
# used for understanding where in io traces starts job file
rw=read
size=128k
bssplit=128k,,
thinktime=1s

[reading job 1]
wait_for=separator job 0
rw=read
size=4096b
bssplit=4k,,

[reading job 2]
wait_for=reading job 1
rw=read
size=20480b
bssplit=4k,,

[reading job 3]
wait_for=reading job 2
rw=read
size=20480b
bssplit=4k,,

[reading job 4]
wait_for=reading job 3
rw=read
size=20480b
bssplit=4k,,

[reading job 5]
stonewall
rw=read
size=20480b
bssplit=4k,,

[reading job 6]
stonewall
rw=read
size=16384b
bssplit=4k,,

My output with delays of read_delta.csv
Header includes:
Timestamp (nanoseconds), operation from blktrace, size of IO in bytes, number of CPU (core), name of the process, a calculated value from previous IO (nanoseconds).
Most important values, which I am concerning I highlighted below. These are numbers represents the delay, which required for fio to switch to the next job.
;timestamp;operation;io_size;cpu_id;process;delta
0;0.047131672;D;4096;2;fio;0.021224560
1;0.068356232;D;135168;2;fio;0.004784061
2;0.073140293;D;135168;2;fio;0.269024046
3;0.342164339;D;131072;11;fio;1.205111417 <- Job file started here
4;1.547275756;D;4096;10;fio;**0.201172283**
5;1.748448039;D;4096;11;fio;0.000361312
6;1.748809351;D;4096;11;fio;0.000271772
7;1.749081123;D;4096;11;fio;0.000261770
8;1.749342893;D;4096;11;fio;0.000256501
9;1.749599394;D;4096;11;fio;**0.200772160**
10;1.950371554;D;4096;11;fio;0.000259826
11;1.950631380;D;4096;11;fio;0.000158999
12;1.950790379;D;4096;11;fio;0.000163855
13;1.950954234;D;4096;11;fio;0.000189879
14;1.951144113;D;4096;11;fio;**0.199244653**
15;2.150388766;D;4096;4;fio;0.000203573
16;2.150592339;D;4096;4;fio;0.000150124
17;2.150742463;D;4096;4;fio;0.000146229
18;2.150888692;D;4096;4;fio;0.000184647
19;2.151073339;D;4096;4;fio;**0.300860646**
20;2.451933985;D;4096;10;fio;0.001125194
21;2.453059179;D;4096;10;fio;0.000649955
22;2.453709134;D;4096;10;fio;0.009264417
23;2.462973551;D;4096;10;fio;0.008190598
24;2.471164149;D;4096;10;fio;**0.282772655**
25;2.753936804;D;4096;10;fio;0.000448414
26;2.754385218;D;4096;10;fio;0.000282554
27;2.754667772;D;4096;10;fio;0.000258501
28;2.754926273;D;4096;10;fio;

Python code to generate the result
'''
    Script intended to get delays of IO operations.
    Author: Azat Nurgaliev, mailto:azat.nurgaliev@xxxxxx
    Created Date: May 4 2021
'''

import pandas as pd
import subprocess, os
from pathlib import Path
import time

# define the device, from which will be collected IO traces
device = "nvme1n1"

# creating a folder with output files
Path("blktrace").mkdir(parents=True, exist_ok=True)

# clearing system caches
subprocess.run(["sh", "-c", 'echo 3 >/proc/sys/vm/drop_caches'])

# submitting blktrace process requires root rights
blktrace = subprocess.Popen(["blktrace", "-d", "/dev/" + device, "-o", device, "-D", "blktrace"], preexec_fn=os.setpgrp, stdout=subprocess.DEVNULL)
print("blktrace spawned with PID: %s" % blktrace.pid)

# starting the fio process
fio = subprocess.Popen(["fio", "delay_job.fio"], stdout=subprocess.PIPE)
print("fio spawned with PID: %s" % fio.pid)
outs, errs = fio.communicate(timeout=100)
print("fio finished")

time.sleep(10)
blktrace.terminate()

# running blkparse and filtering required values
blkparse_read_throughput = "blkparse -i blktrace/" + device + ".blktrace. -f '%5T.%t %a %N %c %C\\n' -a read -o blktrace/" + device + "_read_raw.csv"
blkparse_read_throughput_filtered_operation = "grep ' fio' blktrace/" + device + "_read_raw.csv > blktrace/" + device +  "_read_filtered_process.csv"
blkparse_read_throughput_filtered_process = "grep ' D ' blktrace/" + device + "_read_filtered_process.csv > blktrace/" + device +  "_read_filtered_operation.csv"

subprocess.call(blkparse_read_throughput, shell=True, stdout=subprocess.DEVNULL)
subprocess.call(blkparse_read_throughput_filtered_operation, shell=True)
subprocess.call(blkparse_read_throughput_filtered_process, shell=True)

data_read = pd.read_csv("blktrace/" + device + "_read_filtered_operation.csv", delim_whitespace=True, names=['timestamp', 'operation', 'io_size', 'cpu_id', 'process'])

data_read['delta'] = data_read.timestamp.diff().shift(-1)
#print(data_read)

data_read.to_csv("blktrace/" + device + "_read_delta.csv", sep=";", encoding="utf-8", float_format="%.9f")

How to run the script:
. Create a folder for fio temp files. Better on the separated device
. Change parameter directory in the fio job file with pointing the fio temp folder
. Specify correct device in the python script
. Install required python dependencies, if needed
. Put the fio job file in the same directory as the python script. Name it delay_job.fio
. Run python script with root rights
. The result will be added to the blktrace folder, which is created by the script in the same folder as the script

Best regards,
Azat Nurgaliev