On Wed, 1 Jun 2022, Adriano Silva wrote: > I don't know if my NVME's devices are 4K LBA. I do not think so. They > are all the same model and manufacturer. I know that they work with > blocks of 512 Bytes, but that their latency is very high when processing > blocks of this size. Ok, it should be safe in terms of the possible bcache bug I was referring to if it supports 512b IOs. > However, in all the tests I do with them with 4K blocks, the result is > much better. So I always use 4K blocks. Because in real life I don't > think I'll use blocks smaller than 4K. Makes sense, format with -w 4k. There is probably some CPU benefit to having page-aligned IOs, too. > > You can remove the kernel interpretation using passthrough commands. Here's an > > example comparing with and without FUA assuming a 512b logical block format: > > > > # echo "" | nvme write /dev/nvme0n1 --block-count=7 --data-size=4k --force-unit-access --latency > > # echo "" | nvme write /dev/nvme0n1 --block-count=7 --data-size=4k --latency > > > > if you have a 4k LBA format, use "--block-count=0". > > > > And you may want to run each of the above several times to get an average since > > other factors can affect the reported latency. > > I created a bash script capable of executing the two commands you > suggested to me in a period of 10 seconds in a row, to get some more > acceptable average. The result is the following: > > root@pve-21:~# for i in /sys/block/*/queue/write_cache; do echo 'write back' > $i; done > root@pve-21:~# cat /sys/block/nvme0n1/queue/write_cache > write back > root@pve-21:~# ./nvme_write.sh > Total: 10 seconds, 3027 tests. Latency (us) : min: 29 / avr: 37 / max: 98 > root@pve-21:~# ./nvme_write.sh --force-unit-access > Total: 10 seconds, 2985 tests. Latency (us) : min: 29 / avr: 37 / max: 111 > root@pve-21:~# > root@pve-21:~# ./nvme_write.sh --force-unit-access --block-count=0 > Total: 10 seconds, 2556 tests. Latency (us) : min: 404 / avr: 428 / max: 492 > root@pve-21:~# ./nvme_write.sh --block-count=0 > Total: 10 seconds, 2521 tests. Latency (us) : min: 403 / avr: 428 / max: 496 > root@pve-21:~# > root@pve-21:~# > root@pve-21:~# for i in /sys/block/*/queue/write_cache; do echo 'write through' > $i; done > root@pve-21:~# cat /sys/block/nvme0n1/queue/write_cache > write through > root@pve-21:~# ./nvme_write.sh > Total: 10 seconds, 2988 tests. Latency (us) : min: 29 / avr: 37 / max: 114 > root@pve-21:~# ./nvme_write.sh --force-unit-access > Total: 10 seconds, 2926 tests. Latency (us) : min: 29 / avr: 36 / max: 71 > root@pve-21:~# > root@pve-21:~# ./nvme_write.sh --force-unit-access --block-count=0 > Total: 10 seconds, 2456 tests. Latency (us) : min: 31 / avr: 428 / max: 496 > root@pve-21:~# ./nvme_write.sh --block-count=0 > Total: 10 seconds, 2627 tests. Latency (us) : min: 402 / avr: 428 / max: 509 > > Well, as we can see above, in almost 3k tests run in a period of ten > seconds, with each of the commands, I got even better results than I > already got with ioping. I did tests with isolated commands as well, but > I decided to write a bash script to be able to execute many commands in > a short period of time and make an average. And we can see an average of > about 37us in any situation. Very low! > > However, when using that suggested command --block-count=0 the latency > is very high in any situation, around 428us. > > But as we see, using the nvme command, the latency is always the same in > any scenario, whether with or without --force-unit-access, having a > difference only regarding the use of the command directed to devices > that don't have LBA or that aren't. > > What do you think? It looks like the NVMe works well except in 512b situations. Its interesting that --force-unit-access doesn't increase the latency: Perhaps the NVMe ignores sync flags since it knows it has a non-volatile cache. -Eric > > Tanks, > > > Em segunda-feira, 30 de maio de 2022 10:45:37 BRT, Keith Busch <kbusch@xxxxxxxxxx> escreveu: > > > > > > On Sun, May 29, 2022 at 11:50:57AM +0000, Adriano Silva wrote: > > > So why the slowness? Is it just the time spent in kernel code to set > > FUA and Flush Cache bits on writes that would cause all this latency > > increment (84us to 1.89ms) ? > > > I don't think the kernel's handling accounts for that great of a difference. I > think the difference is probably on the controller side. > > The NVMe spec says that a Write command with FUA set: > > "the controller shall write that data and metadata, if any, to non-volatile > media before indicating command completion." > > So if the memory is non-volatile, it can complete the command without writing > to the backing media. It can also commit the data to the backing media if it > wants to before completing the command, but that's implementation specific > details. > > You can remove the kernel interpretation using passthrough commands. Here's an > example comparing with and without FUA assuming a 512b logical block format: > > # echo "" | nvme write /dev/nvme0n1 --block-count=7 --data-size=4k --force-unit-access --latency > # echo "" | nvme write /dev/nvme0n1 --block-count=7 --data-size=4k --latency > > If you have a 4k LBA format, use "--block-count=0". > > And you may want to run each of the above several times to get an average since > other factors can affect the reported latency. >