Hi, On Mon, 14 Feb 2022 at 18:44, Durval Menezes <jmmml@xxxxxxxxxx> wrote: > > Hello everyone, > > I've arrived at a very surprising number measuring IOPS write performance > on my SSDs' "bare metal" (ie, straight on the /dev/$DISK, no filesystem > involved): > > export COMMON_OPTIONS='--ioengine=libaio --direct=1 --runtime=120 --time_based --group_reporting' > > ls -l /dev/disk/by-id | grep 'ata-.*sda' > lrwxrwxrwx 1 root root 9 Feb 13 17:19 ata-SAMSUNG_MZ7LM1T9HCJM-00003_XXXXXXXXXXXXXX -> ../../sda > > TANGO=/dev/disk/by-id/ata-SAMSUNG_MZ7LM1T9HCJM-00003_XXXXXXXXXXXXXX > sudo fio --filename=${TANGO} --name=device_iops_write --rw=randwrite --bs=4k --iodepth=256 --numjobs=4 ${COMMON_OPTIONS} > [...] > write: *IOPS=83.1k*, BW=325MiB/s (341MB/s)(38.1GiB/120007msec) > [...] > > (please find the complete output at the end of this message, in case I should > have looked at some other lines and/or you are curious) > > As per the official manufacturer specs (both in this whitepaper at their > website[1]), and also in this datasheet I found somewhere else[2]), it's > supposed to be only *18K IOPS*. > > All the other base performance numbers I've measured (read IOPS, read and > write MB/s, read and write latencies) are at or very near the manufacturer > specs. > > What's going on? > > At first I thought that, despite `--direct=1` being explicitly indicated, > my machine's 64GB RAM (via the Linux buffer cache) could be caching the > writes (even if the number, in that case, should have been much higher)... > so, I tested it again with `--runtime=120` to saturate the buffer cache in > case it was really the 'culprit'... lo and behold, the result was: > > [...] > write: IOPS=83.1k, BW=325MiB/s (341MB/s)(190GiB/600019msec) > [...] > > > So, the surprising over-4.6x-times-the-spec Write IOPS is mantained, even > for 190GiB total data. > > And with 190GiB data written (about 10% the total device capacity), I do > not believe it's any kind of cache (RAM, MLC or whatever) inside the SSD > either. You're running your workload for a comparatively short time and additionally we don't know how "fresh" your SSD is. The 18K IOPS value might be when the drive has been fully written and there are no pre-erased blocks available (via so-called preconditioning)... I'll also note the whitepaper [1] mentions this: > SSD Precondition: Sustained state (or steady state) [...] > It's important to note that all performance items mentioned in this white paper have been measured at the sustained state, except the sequential read/write performance I notice that your SSD appears to be SATA (sda) so I'd be surprised that a total queue depth greater than 32 makes a difference (your total queue depth is 1024). Do you get a similar result with just the one job with an iodepth=32? It's unlikely but if the jobs were submitting I/O to the same areas as other jobs at the same time then some of the I/O could be elided but given what you've posted this should not be the case. -- Sitsofe