On 8/3/23 01:31, Jens Axboe wrote: > On 8/2/23 11:01?AM, Jens Axboe wrote: >> On 8/2/23 10:38?AM, Jens Axboe wrote: >>> On 8/2/23 7:52?AM, kernel test robot wrote: >>>> >>>> hi, Jens Axboe, >>>> >>>> though all results in below formal report are improvement, Fengwei (CCed) >>>> checked on another Intel(R) Xeon(R) Gold 6336Y CPU @ 2.40GHz (Ice Lake) >>>> (sorry, since this machine doesn't belong to our team, we cannot intergrate >>>> the results in our report, only can heads-up you here), and found ~30% >>>> stress-ng.msg.ops_per_sec regression. >>>> >>>> but by disable the TRACEPOINT, the regression will disappear. >>>> >>>> Fengwei also tried to remove following section from the patch: >>>> @@ -351,7 +361,8 @@ enum rw_hint { >>>> { IOCB_WRITE, "WRITE" }, \ >>>> { IOCB_WAITQ, "WAITQ" }, \ >>>> { IOCB_NOIO, "NOIO" }, \ >>>> - { IOCB_ALLOC_CACHE, "ALLOC_CACHE" } >>>> + { IOCB_ALLOC_CACHE, "ALLOC_CACHE" }, \ >>>> + { IOCB_DIO_DEFER, "DIO_DEFER" } >>>> >>>> the regression is also gone. >>>> >>>> Fengwei also mentioned to us that his understanding is this code update changed >>>> the data section layout of the kernel. Otherwise, it's hard to explain the >>>> regression/improvement this commit could bring. >>>> >>>> these information and below formal report FYI. >>> >>> Very funky. I ran this on my 256 thread box, and removing the >>> IOCB_DIO_DEFER (which is now IOCB_CALLER_COMP) trace point definition, I >>> get: >>> >>> stress-ng: metrc: [4148] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s >>> stress-ng: metrc: [4148] (secs) (secs) (secs) (real time) (usr+sys time) >>> stress-ng: metrc: [4148] msg 1626997107 60.61 171.63 4003.65 26845470.19 389673.05 >>> >>> and with it being the way it is in the branch: >>> >>> stress-ng: metrc: [3678] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s >>> stress-ng: metrc: [3678] (secs) (secs) (secs) (real time) (usr+sys time) >>> stress-ng: metrc: [3678] msg 1287795248 61.25 140.26 3755.50 21025449.92 330563.24 >>> >>> which is about a -21% bogo ops drop. Then I got a bit suspicious since >>> the previous strings fit in 64 bytes, and now they don't, and I simply >>> shortened the names so they still fit, as per below patch. With that, >>> the regression there is reclaimed. >>> >>> That's as far as I've gotten yet, but I'm guessing we end up placing it >>> differently, maybe now overlapping with data that is dirtied? I didn't >>> profile it very much, just for an overview, and there's really nothing >>> to observe there. The task and system is clearly more idle when the >>> regression hits. >> >> Better variant here. I did confirm via System.map that layout >> drastically changes when we use more than 64 bytes of string data. I'm >> suspecting your test is sensitive to this and it may not mean more than >> the fact that this test is a bit fragile like that, but let me know how >> it works for you with the below. > > Thinking about this just a bit more - it's clear that the bigger strings > change your layour as well. For some cases, that ends up being a big > win, for some it ends up being a loss. This is just the very nature of > how the kernel is linked, and things like LTO deal with that > specifically. > > I don't think there's anything to do here, your test case is just > sensitive to the layout changes caused. That doesn't mean they are > either good or bad, it just means that changes happened and they > happened to impact your test case in either direction. Totally agreed. The layout changes can trigger different results on different env (hardware, toolchain...). I got regression on my env and Oliver got improvement on LKP env. Regards Yin, Fengwei