On 5/6/22 4:23 PM, Jens Axboe wrote: > On 5/6/22 1:00 AM, Hao Xu wrote: >> Let multishot support multishot mode, currently only add accept as its >> first comsumer. >> theoretical analysis: >> 1) when connections come in fast >> - singleshot: >> add accept sqe(userpsace) --> accept inline >> ^ | >> |-----------------| >> - multishot: >> add accept sqe(userspace) --> accept inline >> ^ | >> |--*--| >> >> we do accept repeatedly in * place until get EAGAIN >> >> 2) when connections come in at a low pressure >> similar thing like 1), we reduce a lot of userspace-kernel context >> switch and useless vfs_poll() >> >> >> tests: >> Did some tests, which goes in this way: >> >> server client(multiple) >> accept connect >> read write >> write read >> close close >> >> Basically, raise up a number of clients(on same machine with server) to >> connect to the server, and then write some data to it, the server will >> write those data back to the client after it receives them, and then >> close the connection after write return. Then the client will read the >> data and then close the connection. Here I test 10000 clients connect >> one server, data size 128 bytes. And each client has a go routine for >> it, so they come to the server in short time. >> test 20 times before/after this patchset, time spent:(unit cycle, which >> is the return value of clock()) >> before: >> 1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075 >> +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984 >> +1934226+1914385)/20.0 = 1927633.75 >> after: >> 1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112 >> +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506 >> +1871324+1940803)/20.0 = 1894750.45 >> >> (1927633.75 - 1894750.45) / 1927633.75 = 1.65% >> >> >> A liburing test is here: >> https://github.com/HowHsu/liburing/blob/multishot_accept/test/accept.c > > Wish I had seen that, I wrote my own! But maybe that's good, you tend to > find other issues through that. > > Anyway, works for me in testing, and I can see this being a nice win for > accept intensive workloads. I pushed a bunch of cleanup patches that > should just get folded in. Can you fold them into your patches and > address the other feedback, and post a v3? I pushed the test branch > here: > > https://git.kernel.dk/cgit/linux-block/log/?h=fastpoll-mshot Quick benchmark here, accepting 10k connections: Stock kernel real 0m0.728s user 0m0.009s sys 0m0.192s Patched real 0m0.684s user 0m0.018s sys 0m0.102s Looks like a nice win for a highly synthetic benchmark. Nothing scientific, was just curious. -- Jens Axboe