Re: CPUs, threads, and speed

"Kudryavtsev, Andrey O" <andrey.o.kudryavtsev@xxxxxxxxx> · Wed, 15 Jan 2020 18:33:01 +0000

I can clarify that as I posted the original script on github. 
Sequential preconditioning is mandatory for bandwidth test. Random 4k preconditioning is for everything else. For all mixed scenarios data has to be randomized, so, that puts highest pressure on the drive (and internally WAF in case of NAND SSD). That makes all following benchmarks fair.  
Norandommap - I agree in general, but the FIO overhead of tracking LBAs impacts the performance and extends pre-fill time. 

-- 
Andrey Kudryavtsev, 
SSD Solution Architect
Intel Corp. 

On 1/15/20, 9:29 AM, "fio-owner@xxxxxxxxxxxxxxx on behalf of Gruher, Joseph R" <fio-owner@xxxxxxxxxxxxxxx on behalf of joseph.r.gruher@xxxxxxxxx> wrote:

    > -----Original Message-----
    > From: fio-owner@xxxxxxxxxxxxxxx <fio-owner@xxxxxxxxxxxxxxx> On Behalf Of
    > Mauricio Tavares
    > Sent: Wednesday, January 15, 2020 7:51 AM
    > To: fio@xxxxxxxxxxxxxxx
    > Subject: CPUs, threads, and speed
    > 
    > Let's say I have a config file to preload drive that looks like this (stolen from
    > https://github.com/intel/fiovisualizer/blob/master/Workloads/Precondition/fill
    > _4KRandom_NVMe.ini)
    > 
    > [global]
    > name=4k random write 4 ios in the queue in 32 queues
    > filename=/dev/nvme0n1
    > ioengine=libaio
    > direct=1
    > bs=4k
    > rw=randwrite
    > iodepth=4
    > numjobs=32
    > buffered=0
    > size=100%
    > loops=2
    > randrepeat=0
    > norandommap
    > refill_buffers
    > 
    > [job1]
    > 
    > That is taking a ton of time, like days to go. Is there anything I can do to speed it
    > up? 

    When you say preload, do you just want to write in the full capacity of the drive?  A sequential workload with larger blocks will be faster, like:

    [global]
    ioengine=libaio
    thread=1
    direct=1
    bs=128k
    rw=write
    numjobs=1
    iodepth=128
    size=100%
    loops=2
    [job00]
    filename=/dev/nvme0n1

    Or if you have a use case where you specifically want to write it in with 4K blocks, you could probably increase your queue depth way beyond 4 and see improvement in performance, and you probably don't want to specify norandommap if you're trying to hit every block on the device.

    -Joe