RE: CPUs, threads, and speed

"Elliott, Robert (Servers)" <elliott@xxxxxxx> · Wed, 15 Jan 2020 21:33:33 +0000

> -----Original Message-----
> From: fio-owner@xxxxxxxxxxxxxxx <fio-owner@xxxxxxxxxxxxxxx> On Behalf Of
> Mauricio Tavares
> Sent: Wednesday, January 15, 2020 9:51 AM
> Subject: CPUs, threads, and speed
> 
...
> [global]
> name=4k random write 4 ios in the queue in 32 queues
> filename=/dev/nvme0n1
> ioengine=libaio
> direct=1
> bs=4k
> rw=randwrite
> iodepth=4
> numjobs=32
> buffered=0
> size=100%
> loops=2
> randrepeat=0
> norandommap
> refill_buffers
> 
> [job1]
> 
> That is taking a ton of time, like days to go. Is there anything I can
> do to speed it up? For instance, what is the default value for
> cpus_allowed (or cpumask)[2]? Is it all CPUs? If not what would I gain
> by throwing more cpus at the problem?
> 
> I also read[2] by default fio uses fork. What would I get by going to
> threads?

> Jobs: 32 (f=32): [w(32)][10.8%][w=301MiB/s][w=77.0k IOPS][eta 06d:13h:56m:51s]]

77 kIOPs for random writes isn't bad - check your drive data sheet.
If the drive is 1 TB, it should take 
    1 TB / (77k * 4 KiB) = 3170 s = 52.8 minutes 
to write the whole drive.

Best practice is to use all CPU cores, lock threads to cores, and
be NUMA aware. If the device is attached to physical CPU 0 and that CPU
has 12 cores known to linux as 0-11 (per "lscpu" or "numactl --hardware"),
try:
  iodepth=16
  numjobs=12
  cpus_allowed=0-11
  cpus_allowed_policy=split

Based on these:
  numjobs=32, size=100%, loops=2
fio will run each job for that many bytes, so a 1 TB drive will result 
in IOs for 64 TB rather than 1 TB. That could easily result in the
multi-day estimate.

Other nits:
* thread - threading might be slightly more efficient than 
  spawning full processes
* gtod_reduce=1 - precision latency measurements don't matter for this
* refill_buffers - presuming you don't care about the data contents,
  don't include this. zero_buffers is the simplest/fastest, unless you're
  concerned that the device might do compression or zero detection
* norandommap - if you want it to hit each LBA a precise number
  of times, you can't include this; fio won't remember what it's 
  done. There is a lot of overhead in keeping track, though.