Re: CPUs, threads, and speed

Jared Walton <jawalking@xxxxxxxxx> · Thu, 16 Jan 2020 12:03:21 -0700

Correct, I pre-condition for IOPS testing by utilizing the the last if
block, only using randwrite. which will run random writes for about
45min, until a steady state is achieved.

On Thu, Jan 16, 2020 at 11:40 AM Andrey Kuzmin
<andrey.v.kuzmin@xxxxxxxxx> wrote:
>
> On Thu, Jan 16, 2020 at 9:31 PM Jared Walton <jawalking@xxxxxxxxx> wrote:
> >
> > Not sure if this will help, but I use the following to prep multiple
> > 4TB drives at the same time in a little over an hour.
>
> You seem to be preconditioning with sequential writes only, and
> further doing so
> with essentially single write frontier.
>
> That doesn't stress FTL maps enough and doesn't trigger any substantial garbage
> collection since SSD is intelligent enough to spot sequential write
> workload with
> 128K sequential (re)writes.
>
> So what you're doing is only good for bandwidth measurements, and if
> this steady
> state is applied to random IOPS profiling, you'd be getting highly
> inflated results.
>
> Regards,
> Andrey
>
> > Is it inelegant, yes, but it works for me.
> >
> > globalFIOParameters="--offset=0 --ioengine=libaio --invalidate=1
> > --group_reporting --direct=1 --thread --refill_buffers --norandommap
> > --randrepeat=0 --allow_mounted_write=1 --output-format=json,normal"
> >
> > # Drives should be FOB or LLF'd (if it's good to do that)
> > # LLF logic
> >
> > # 128k Pre-Condition
> > # Write to entire disk
> > for i in `ls -1 /dev/nvme*n1`
> > do
> >     size=$(fdisk -l | grep ${i} | awk -F "," '{ print $2 }' | awk '{
> > print $1 }')
> > ./fio --name=PreconditionPass1of3 --filename=${i} --iodepth=$iodepth
> > --bs=128k --rw=write --size=${size} --fill_device=1
> > $globalFIOParameters &
> > done
> > wait
> >
> > # Read entire disk
> > for i in `ls -1 /dev/nvme*n1`
> > do
> >     size=$(fdisk -l | grep ${i} | awk -F "," '{ print $2 }' | awk '{
> > print $1 }')
> > ./fio --name=PreconditionPass2of3 --filename=${i} --iodepth=$iodepth
> > --bs=128k --rw=read --size=${size} --fill_device=1
> > $globalFIOParameters &
> > done
> > wait
> >
> > # Write to entire disk one last time
> > for i in `ls -1 /dev/nvme*n1`
> > do
> >     size=$(fdisk -l | grep ${i} | awk -F "," '{ print $2 }' | awk '{
> > print $1 }')
> > ./fio --name=PreconditionPass3of3 --filename=${i} --iodepth=$iodepth
> > --bs=128k --rw=write --size=${size} --fill_device=1
> > $globalFIOParameters &
> > done
> > wait
> >
> >
> > # Check 128k steady-state
> > for i in `ls -1 /dev/nvme*n1`
> > do
> > ./fio --name=SteadyState --filename=${i} --iodepth=16 --numjobs=16
> > --bs=4k --rw=read --ss_dur=1800 --ss=iops_slope:0.3% --runtime=24h
> > $globalFIOParameters &
> > done
> > wait
> >
> > On Thu, Jan 16, 2020 at 9:13 AM Mauricio Tavares <raubvogel@xxxxxxxxx> wrote:
> > >
> > > On Thu, Jan 16, 2020 at 2:00 AM Andrey Kuzmin <andrey.v.kuzmin@xxxxxxxxx> wrote:
> > > >
> > > > On Wed, Jan 15, 2020 at 11:36 PM Mauricio Tavares <raubvogel@xxxxxxxxx> wrote:
> > > > >
> > > > > On Wed, Jan 15, 2020 at 2:00 PM Andrey Kuzmin <andrey.v.kuzmin@xxxxxxxxx> wrote:
> > > > > >
> > > > > > On Wed, Jan 15, 2020 at 9:29 PM Mauricio Tavares <raubvogel@xxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Wed, Jan 15, 2020 at 1:04 PM Andrey Kuzmin <andrey.v.kuzmin@xxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > On Wed, Jan 15, 2020 at 8:29 PM Gruher, Joseph R
> > > > > > > > <joseph.r.gruher@xxxxxxxxx> wrote:
> > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: fio-owner@xxxxxxxxxxxxxxx <fio-owner@xxxxxxxxxxxxxxx> On Behalf Of
> > > > > > > > > > Mauricio Tavares
> > > > > > > > > > Sent: Wednesday, January 15, 2020 7:51 AM
> > > > > > > > > > To: fio@xxxxxxxxxxxxxxx
> > > > > > > > > > Subject: CPUs, threads, and speed
> > > > > > > > > >
> > > > > > > > > > Let's say I have a config file to preload drive that looks like this (stolen from
> > > > > > > > > > https://github.com/intel/fiovisualizer/blob/master/Workloads/Precondition/fill
> > > > > > > > > > _4KRandom_NVMe.ini)
> > > > > > > > > >
> > > > > > > > > > [global]
> > > > > > > > > > name=4k random write 4 ios in the queue in 32 queues
> > > > > > > > > > filename=/dev/nvme0n1
> > > > > > > > > > ioengine=libaio
> > > > > > > > > > direct=1
> > > > > > > > > > bs=4k
> > > > > > > > > > rw=randwrite
> > > > > > > > > > iodepth=4
> > > > > > > > > > numjobs=32
> > > > > > > > > > buffered=0
> > > > > > > > > > size=100%
> > > > > > > > > > loops=2
> > > > > > > > > > randrepeat=0
> > > > > > > > > > norandommap
> > > > > > > > > > refill_buffers
> > > > > > > > > >
> > > > > > > > > > [job1]
> > > > > > > > > >
> > > > > > > > > > That is taking a ton of time, like days to go. Is there anything I can do to speed it
> > > > > > > > > > up?
> > > > > > > > >
> > > > > > > > > When you say preload, do you just want to write in the full capacity of the drive?
> > > > > > > >
> > > > > > > > I believe that preload here means what in SSD world is called drive
> > > > > > > > preconditioning. It means bringing a fresh drive into steady mode
> > > > > > > > where it gives you the true performance in production over months of
> > > > > > > > use rather than the unrealistic fresh drive random write IOPS.
> > > > > > > >
> > > > > > > > > A sequential workload with larger blocks will be faster,
> > > > > > > >
> > > > > > > > No, you cannot get the job done by sequential writes since it doesn't
> > > > > > > > populate FTL translation tables like random writes do.
> > > > > > > >
> > > > > > > > As to taking a ton, the rule of thumb is to give the SSD 2xcapacity
> > > > > > > > worth of random writes. At today speeds, that should take just a
> > > > > > > > couple of hours.
> > > > > > > >
> > > > > > >       When you say 2xcapacity worth of random writes, do you mean just
> > > > > > > setting size=200%?
> > > > > >
> > > > > > Right.
> > > > > >
> > > > >       Then I wonder what I am doing wrong now. I changed the config file to
> > > > >
> > > > > [root@testbox tests]# cat preload.conf
> > > > > [global]
> > > > > name=4k random write 4 ios in the queue in 32 queues
> > > > > ioengine=libaio
> > > > > direct=1
> > > > > bs=4k
> > > > > rw=randwrite
> > > > > iodepth=4
> > > > > numjobs=32
> > > > > buffered=0
> > > > > size=200%
> > > > > loops=2
> > > > > random_generator=tausworthe64
> > > > > thread=1
> > > > >
> > > > > [job1]
> > > > > filename=/dev/nvme0n1
> > > > > [root@testbox tests]#
> > > > >
> > > > > but when I run it, now it spits out much larger eta times:
> > > > >
> > > > > Jobs: 32 (f=32): [w(32)][0.0%][w=382MiB/s][w=97.7k IOPS][eta
> > > > > 16580099d:14h:55m:27s]]
> > > >
> > > >  Size is set on per thread basis, so you're doing 32x200%x2 loops=128
> > > > drive capacities here.
> > > >
> > > > Also, using 32 threads doesn't improve anything. 2 (and even one)
> > > > threads with qd=128 will push the drive
> > > > to its limits.
> > > >
> > >      Update: so I redid the config file a bit to pass some of the
> > > arguments from command line, and cut down number of jobs and loops.
> > > And I ran it again, this time sequential write to the drive I have not
> > > touched to see how fast it was going to go. My eta is still
> > > astronomical:
> > >
> > > [root@testbox tests]# cat preload_fio.conf
> > > [global]
> > > name=4k random
> > > ioengine=${ioengine}
> > > direct=1
> > > bs=${bs_size}
> > > rw=${iotype}
> > > iodepth=4
> > > numjobs=1
> > > buffered=0
> > > size=200%
> > > loops=1
> > >
> > > [job1]
> > > filename=${devicename}
> > > [root@testbox tests]# devicename=/dev/nvme1n1 ioengine=libaio
> > > iotype=write bs_size=128k ~/dev/fio/fio ./preload_fio.conf
> > > job1: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T)
> > > 128KiB-128KiB, ioengine=libaio, iodepth=4
> > > fio-3.17-68-g3f1e
> > > Starting 1 process
> > > Jobs: 1 (f=1): [W(1)][0.0%][w=1906MiB/s][w=15.2k IOPS][eta 108616d:00h:00m:24s]
> > >
> > > > Regards,
> > > > Andrey
> > > > >
> > > > > Compare with what I was getting with size=100%
> > > > >
> > > > >  Jobs: 32 (f=32): [w(32)][10.8%][w=301MiB/s][w=77.0k IOPS][eta 06d:13h:56m:51s]]
> > > > >
> > > > > > Regards,
> > > > > > Andrey
> > > > > > >
> > > > > > > > Regards,
> > > > > > > > Andrey
> > > > > > > >
> > > > > > > > > like:
> > > > > > > > >
> > > > > > > > > [global]
> > > > > > > > > ioengine=libaio
> > > > > > > > > thread=1
> > > > > > > > > direct=1
> > > > > > > > > bs=128k
> > > > > > > > > rw=write
> > > > > > > > > numjobs=1
> > > > > > > > > iodepth=128
> > > > > > > > > size=100%
> > > > > > > > > loops=2
> > > > > > > > > [job00]
> > > > > > > > > filename=/dev/nvme0n1
> > > > > > > > >
> > > > > > > > > Or if you have a use case where you specifically want to write it in with 4K blocks, you could probably increase your queue depth way beyond 4 and see improvement in performance, and you probably don't want to specify norandommap if you're trying to hit every block on the device.
> > > > > > > > >
> > > > > > > > > -Joe