Chaitanya Kulkarni <chaitanyak@xxxxxxxxxx> writes: >> For more interesting cases, where there is queueing, we need to take >> into account the cross-communication of the atomic operations. I've >> been benchmarking by running parallel fio jobs against a single hctx >> nullb in different hardware queue depth scenarios, and verifying both >> IOPS and queueing. >> >> Each experiment was repeated 5 times on a 20-CPU box, with 20 parallel >> jobs. fio was issuing fixed-size randwrites with qd=64 against nullb, >> varying only the hardware queue length per test. >> >> queue size 2 4 8 16 32 64 >> 6.1-rc2 1681.1K (1.6K) 2633.0K (12.7K) 6940.8K (16.3K) 8172.3K (617.5K) 8391.7K (367.1K) 8606.1K (351.2K) >> patched 1721.8K (15.1K) 3016.7K (3.8K) 7543.0K (89.4K) 8132.5K (303.4K) 8324.2K (230.6K) 8401.8K (284.7K) > >> Hi Chaitanya, Thanks for the feedback. > So if I understand correctly > QD 2,4,8 shows clear performance benefit from this patch whereas > QD 16, 32, 64 shows drop in performance it that correct ? > > If my observation is correct then applications with high QD will > observe drop in the performance ? To be honest, I'm not sure. Given the overlap of the standard variation (in parenthesis) with the mean, I'm not sure the observed drop is statistically significant. In my prior analysis, I thought it wasn't. I don't see where a significant difference would come from, to be honest, because the higher the QD, the more likely it is to go through the not-contended path, where sbq->ws_active == 0. This hot path is identical to the existing implementation. > Also, please share a table with block size/IOPS/BW/CPU (system/user) > /LAT/SLAT with % increase/decrease and document the raw numbers at the > end of the cover-letter for completeness along with fio job to others > can repeat the experiment... This was issued against the nullb and the IO size is fixed, matching the device's block size (512b), which is why I am not tracking BW, only IOPS. I'm not sure the BW is still relevant in this scenario. I'll definitely follow up with CPU time and latencies, and share the fio job. I'll also take another look on the significance of the measured values for high QD. Thank you, -- Gabriel Krisman Bertazi