RE: Inconsistent results while running jobs with other app on background

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Pedro, you don't mention the kernel and how you test, but simplifying things , making sure you stay on the exact NUMA node, and using the most current / compile on your own machine fio. 

Below are some methods to understand a single thread simple QD1 test ... you might want to apply some of these practices?

This is going public pretty soon.

Steps to improving performance of Intel SSDs on Linux OS
Step 1: Put your CPU’s in performance mode
#   echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
Ensure the cpu scaling governor is in performance mode by checking the following you will see the setting from each processor (vcpu). 
# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
You should see performance as the return of this command.
Don’t forget to make this setting persistent between reboots by changing your Linux restart configuration (ie. rc).
Step 2: Disable IRQ balance (only in older than Linux 4.8 kernel versions)
In kernels before version 4.8 the irq balancing was not managed efficiently as it is now by the in-box Linux nvme driver. So if you are on older kernels than 4.8, please turn off the irqbalance service and run a short script (below) to balance your irq’s to allow for the best io processing possible. Here are the steps on how to do this between Ubuntu and CentOS.
Set ‘Enabled’ to “0” in /etc/default/irqbalance on Ubuntu. As shown here, you can disable the service with the following command on CentOS:
# systemctl disable –now irqbalance
In CentOS, you can use these steps to stop or make it permanent between reboots.
#systemctl stop irqbalance
#systemctl status irqbalance
It should show Active: inactive (dead) on the third line.
Now to make this permanent.
# chkconfig irqbalance off
# chkconfig irqbalance
It will show “disabled”.
Here is a bash script to set SMP affinity if you wish to do that, but this is only needed on kernels prior to kernel 4.8.
#!/bin/bash
folders=/proc/irq/*;
for folder in $folders; do
files=”$folder/*”;
for file in $files; do
if [[ $file == *”nvme”* ]]; then
echo $file;
contents=`cat $folder/affinity_hint`;
echo $contents > $folder/smp_affinity;
cat $folder/smp_affinity;
fi
done
done
Step 3: Enabling polling or poll queues in your Linux in-box nvme driver
Since Linux 4.20 there were optimizations to the nvme driver to allow for a new parameter that governs polling. Polling should not involve interrupts of any kind, and changes were needed by the nvme driver developers to allow for this improvement. This brought the advent of poll queues which are now available since 4.20 and beyond.  
To affect nvme to run with poll queues you should load the driver with the parameter. You want to setup poll queues equal to the number of virtual cores in your system.
My example is:
To enable polling mode by device (before kernel 4.20) -

# echo 1 > /sys/block/nvme0n1/queue/io_poll
To enable poll_queues in the nvme driver system wide (since kernel 4.20)- 
# modprobe -r nvme && modprobe nvme poll_queues=4
The above is for a quick system test, if you want to enable it on boot, or if you are booting from an NVMe drive don’t use the above, use this method.
# more /etc/modprobe.d/nvme.conf
options nvme poll_queues=4
You now need to rebuild initramfs so the parameter is in the module parameter is picked up. Here are commands, first backup the image, and then rebuild it.
# cp /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.$(date +%m-%d-%H%M%S).bak
# dracut --force
Now reboot your system and check your work… 
# systool -vm nvme
You should set that poll_queues is set to your desired setting, and no longer 0.
poll_queues         = "4"
You can also setup Grub2 to add this kernel option by following instruction in the CentOS wiki for CentOS 7. https://wiki.centos.org/HowTos/Grub2
Step 4: Choose appropriate fio ioengine, and I/O polling mode
Now to check the configuration of the system you built. The most critical performance will show itself at QD1 (queue depth 1) with just 1 worker thread. You can run this with any number of ioengines, but recommended is pvsync2 or io_uring in hipri mode. Here are the requirements:
•	Polling mode per device requires Linux kernel 4.8 or newer. 
•	Poll queues require Linux kernel 4.20 or newer.
•	io_uring requires Linux kernel 5.0 or newer.  

If you are just getting started with developing with io_uring, you should move to the most stable Linux 5.x kernel based on when you are reading this. 
Below is a recommended fio script:
[global]
name= OptaneInitialPerfTest
ioengine=pvsync2
hipri
direct=1
buffered=0
size=100%
randrepeat=0
time_based
ramp_time=0
norandommap
refill_buffers
log_avg_msec=1000
log_max_value=1
group_reporting
percentile_list=1.0:25.0:50.0:75.0:90.0:99.0:99.9:99.99:99.999:99.9999:99.99999:99.999999:100.0
filename=/dev/nvme0n1
[rd_rnd_qd_1_4k_1w]
bs=4k
iodepth=1
numjobs=1
rw=randread
cpus_allowed=0-17
runtime=300
write_bw_log=bw_rd_rnd_qd_1_4k_1w
write_iops_log=iops_rd_rnd_qd_1_4k_1w
write_lat_log=lat_rd_rnd_qd_1_4k_1w
We use cpus_allowed for numa locality here and for no other reason, based on the server I ran this on, this job will burn just one core. If everything is working right you should be pretty much burning one entire core. You may also need to compile fio a specific way to get the ioengines that you particularly want.
Results from a build with Intel Xeon Gold 6254 CPUs and Linux 5.4.1-1 kernel
Summary output from fio on my system:
rd_rnd_qd_1_4k_1w: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=pvsync2, iodepth=1
fio-3.16-64-gfd988
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=487MiB/s][r=125k IOPS][eta 00m:00s]
rd_rnd_qd_1_4k_1w: (groupid=0, jobs=1): err= 0: pid=3036: Wed Jan 15 14:00:45 2020
  read: IOPS=125k, BW=487MiB/s (511MB/s)(143GiB/300001msec)
    clat (usec): min=7, max=202, avg= 7.76, stdev= 1.29
     lat (usec): min=7, max=202, avg= 7.78, stdev= 1.29
    clat percentiles (usec):
     |  1.000000th=[    8], 25.000000th=[    8], 50.000000th=[    8],
     | 75.000000th=[    8], 90.000000th=[    8], 99.000000th=[   10],
     | 99.900000th=[   33], 99.990000th=[   38], 99.999000th=[  106],
     | 99.999900th=[  159], 99.999990th=[  167], 99.999999th=[  204],
     | 100.000000th=[  204]
   bw (  KiB/s): min=498144, max=500708, per=100.00%, avg=499144.54, stdev=489.10, samples=299
   iops        : min=124536, max=125177, avg=124786.12, stdev=122.32, samples=299
  lat (usec)   : 10=99.27%, 20=0.55%, 50=0.17%, 100=0.01%, 250=0.01%
  cpu          : usr=5.80%, sys=94.07%, ctx=1018, majf=0, minf=23
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=37437069,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=487MiB/s (511MB/s), 487MiB/s-487MiB/s (511MB/s-511MB/s), io=143GiB (153GB), run=300001-300001msec

Disk stats (read/write):
  nvme0n1: ios=37424564/0, merge=0/0, ticks=261212/0, in_queue=0, util=99.99%


F

-----Original Message-----
From: fio-owner@xxxxxxxxxxxxxxx <fio-owner@xxxxxxxxxxxxxxx> On Behalf Of Pedro Leão da Cruz
Sent: Thursday, January 16, 2020 7:20 AM
To: fio <fio@xxxxxxxxxxxxxxx>
Subject: Inconsistent results while running jobs with other app on background

Hi everyone, I am running FIO 3.1 on Ubuntu 18.04 testing an Intel Optane drive. I am getting quite inconsistent results for clat percentiles depending on whether I run a Java application on the background or not. With the Java app, which I use to collect some other data, I get 2-3% worst results. Is there any obvious ways to mitigate the interference from other applications while running a FIO job?

Thanks :)




[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux