On 9/2/22 1:48 PM, Kent Overstreet wrote: > On Fri, Sep 02, 2022 at 06:02:12AM -0600, Jens Axboe wrote: >> On 9/1/22 7:04 PM, Roman Gushchin wrote: >>> On Thu, Sep 01, 2022 at 08:17:47PM -0400, Kent Overstreet wrote: >>>> On Thu, Sep 01, 2022 at 03:53:57PM -0700, Roman Gushchin wrote: >>>>> I'd suggest to run something like iperf on a fast hardware. And maybe some >>>>> io_uring stuff too. These are two places which were historically most sensitive >>>>> to the (kernel) memory accounting speed. >>>> >>>> I'm getting wildly inconsistent results with iperf. >>>> >>>> io_uring-echo-server and rust_echo_bench gets me: >>>> Benchmarking: 127.0.0.1:12345 >>>> 50 clients, running 512 bytes, 60 sec. >>>> >>>> Without alloc tagging: 120547 request/sec >>>> With: 116748 request/sec >>>> >>>> https://github.com/frevib/io_uring-echo-server >>>> https://github.com/haraldh/rust_echo_bench >>>> >>>> How's that look to you? Close enough? :) >>> >>> Yes, this looks good (a bit too good). >>> >>> I'm not that familiar with io_uring, Jens and Pavel should have a better idea >>> what and how to run (I know they've workarounded the kernel memory accounting >>> because of the performance in the past, this is why I suspect it might be an >>> issue here as well). >> >> io_uring isn't alloc+free intensive on a per request basis anymore, it >> would not be a good benchmark if the goal is to check for regressions in >> that area. > > Good to know. The benchmark is still a TCP benchmark though, so still useful. > > Matthew suggested > while true; do echo 1 >/tmp/foo; rm /tmp/foo; done > > I ran that on tmpfs, and the numbers with and without alloc tagging were > statistically equal - there was a fair amount of variation, it wasn't a super > controlled test, anywhere from 38-41 seconds with 100000 iterations (and alloc > tagging was some of the faster runs). > > But with memcg off, it ran in 32-33 seconds. We're piggybacking on the same > mechanism memcg uses for stashing per-object pointers, so it looks like that's > the bigger cost. I've complained about memcg accounting before, the slowness of it is why io_uring works around it by caching. Anything we account we try NOT do in the fast path because of it, the slowdown is considerable. You care about efficiency now? I thought that was relegated to irrelevant 10M IOPS cases. -- Jens Axboe