Il giorno gio, 26/12/2019 alle 16.37 +0800, Ming Lei ha scritto: > On Wed, Dec 25, 2019 at 10:30:57PM -0500, Theodore Y. Ts'o wrote: > > On Thu, Dec 26, 2019 at 10:27:02AM +0800, Ming Lei wrote: > > > Maybe we need to be careful for HDD., since the request count in > scheduler > > > queue is double of in-flight request count, and in theory NCQ > should only > > > cover all in-flight 32 requests. I will find a sata HDD., and > see if > > > performance drop can be observed in the similar 'cp' test. > > > > Please try to measure it, but I'd be really surprised if it's > > significant with with modern HDD's. > > Just find one machine with AHCI SATA, and run the following xfs > overwrite test: > > #!/bin/bash > DIR=$1 > echo 3 > /proc/sys/vm/drop_caches > fio --readwrite=write --filesize=5g --overwrite=1 -- > filename=$DIR/fiofile \ > --runtime=60s --time_based --ioengine=psync --direct=0 -- > bs=4k > --iodepth=128 --numjobs=2 --group_reporting=1 -- > name=overwrite > > FS is xfs, and disk is LVM over AHCI SATA with NCQ(depth 32), > because the > machine is picked up from RH beaker, and it is the only disk in the > box. > > #lsblk > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > sda 8:0 0 931.5G 0 disk > ├─sda1 8:1 0 1G 0 part /boot > └─sda2 8:2 0 930.5G 0 part > ├─rhel_hpe--ml10gen9--01-root 253:0 0 50G 0 lvm / > ├─rhel_hpe--ml10gen9--01-swap 253:1 0 3.9G 0 lvm [SWAP] > └─rhel_hpe--ml10gen9--01-home 253:2 0 876.6G 0 lvm /home > > > kernel: 3a7ea2c483a53fc("scsi: provide mq_ops->busy() hook") which > is > the previous commit of f664a3cc17b7 ("scsi: kill off the legacy IO > path"). > > |scsi_mod.use_blk_mq=N |scsi_mod.use_blk_mq=Y | > ----------------------------------------------------------- > throughput: |244MB/s |169MB/s | > ----------------------------------------------------------- > > Similar result can be observed on v5.4 kernel(184MB/s) with same > test > steps. > > > > That because they typically have > > a queue depth of 16, and a max_sectors_kb of 32767 (e.g., just > under > > 32 MiB). Sort seeks are typically 1-2 ms, with full stroke seeks > > 8-10ms. Typical sequential write speeds on a 7200 RPM drive is > > 125-150 MiB/s. So suppose every other request sent to the HDD is > from > > the other request stream. The disk will chose the 8 requests from > its > > queue that are contiguous, and so it will be writing around 256 > MiB, > > which will take 2-3 seconds. If it then needs to spend between 1 > and > > 10 ms seeking to another location of the disk, before it writes > the > > next 256 MiB, the worst case overhead of that seek is 10ms / 2s, > or > > 0.5%. That may very well be within your measurements' error bars. > > Looks you assume that disk seeking just happens once when writing > around > 256MB. This assumption may not be true, given all data can be in > page > cache before writing. So when two tasks are submitting IOs > concurrently, > IOs from each single task is sequential, and NCQ may order the > current batch > submitted from the two streams. However disk seeking may still be > needed > for the next batch handled by NCQ. > > > And of course, note that in real life, we are very *often* writing > to > > multiple files in parallel, for example, during a "make -j16" > while > > building the kernel. Writing a single large file is certainly > > something people do (but even there people who are burning a 4G > DVD > > rip are often browsing the web while they are waiting for it to > > complete, and the browser will be writing cache files, etc.). So > > whether or not this is something where we should be stressing over > > this specific workload is going to be quite debateable. > Hi, is there any update on this? Sorry if I am making noise, but I would like to help to improve the kernel (or fix it) if I can help. Otherwise, please let me know how to consider this case, Thanks, and bye Andrea