On Thu, Dec 26, 2019 at 10:27:02AM +0800, Ming Lei wrote: > Maybe we need to be careful for HDD., since the request count in scheduler > queue is double of in-flight request count, and in theory NCQ should only > cover all in-flight 32 requests. I will find a sata HDD., and see if > performance drop can be observed in the similar 'cp' test. Please try to measure it, but I'd be really surprised if it's significant with with modern HDD's. That because they typically have a queue depth of 16, and a max_sectors_kb of 32767 (e.g., just under 32 MiB). Sort seeks are typically 1-2 ms, with full stroke seeks 8-10ms. Typical sequential write speeds on a 7200 RPM drive is 125-150 MiB/s. So suppose every other request sent to the HDD is from the other request stream. The disk will chose the 8 requests from its queue that are contiguous, and so it will be writing around 256 MiB, which will take 2-3 seconds. If it then needs to spend between 1 and 10 ms seeking to another location of the disk, before it writes the next 256 MiB, the worst case overhead of that seek is 10ms / 2s, or 0.5%. That may very well be within your measurements' error bars. And of course, note that in real life, we are very *often* writing to multiple files in parallel, for example, during a "make -j16" while building the kernel. Writing a single large file is certainly something people do (but even there people who are burning a 4G DVD rip are often browsing the web while they are waiting for it to complete, and the browser will be writing cache files, etc.). So whether or not this is something where we should be stressing over this specific workload is going to be quite debateable. - Ted