it's definitely more expensive to drive two drives vs one for these kinds of tests. [...] it does show extra overhead for polling 2 drives vs just one.
Oh, right. Sorry. Thanks a lot for the interesting measurements. Also, I somehow missed that you already are at 3.5M IOPS. Crazy.
No, you're running something from around that same time, not what I was running
I upgraded the Kernel to 5.10.32 (that's the most recent one provided by Canonical for Ubuntu 20.04) and tried again. I'm still stuck at 1.0M IOPS.
I really did want --no-children, the default is pretty useless imho... But the callgraphs are a must!
I think I now managed to export what you expected (somehow "-g" was not enough and I needed to do "-g graph", even though that should actually be the default). You can find the perf output attached. It was recorded on Kernel 5.10.32. Hans-Peter
Attachment:
output.gz
Description: application/gzip