On 8/25/21 4:19 AM, Hao Xu wrote: > 在 2021/8/24 下午8:57, Pavel Begunkov 写道: >> On 8/23/21 7:36 PM, Hao Xu wrote: >>> Now we have a lot of task_work users, some are just to complete a req >>> and generate a cqe. Let's put the work at the head position of the >>> task_list, so that it can be handled quickly and thus to reduce >>> avg req latency. an explanatory case: >>> >>> origin timeline: >>> submit_sqe-->irq-->add completion task_work >>> -->run heavy work0~n-->run completion task_work >>> now timeline: >>> submit_sqe-->irq-->add completion task_work >>> -->run completion task_work-->run heavy work0~n >> >> Might be good. There are not so many hot tw users: >> poll, queuing linked requests, and the new IRQ. Could be >> BPF in the future. > async buffered reads as well, regarding buffered reads is > hot operation. Good case as well, forgot about it. Should be not so hot, as it's only when reads are served out of the buffer cache. >> So, for the test case I'd think about some heavy-ish >> submissions linked to your IRQ req. For instance, >> keeping a large QD of >> >> read(IRQ-based) -> linked read_pipe(PAGE_SIZE); >> >> and running it for a while, so they get completely >> out of sync and tw works really mix up. It reads >> from pipes size<=PAGE_SIZE, so it completes inline, >> but the copy takes enough of time. > Thanks Pavel, previously I tried > direct read-->buffered read(async buffered read) > didn't see much difference. I'll try the above case > you offered. Hmm, considering that pipes have to be refilled, buffered reads may be a better option. I'd make them all to read the same page, + registered buffer + reg file. And then it'd probably depend on how fast your main SSD is. mem = malloc_align(4096); io_uring_register_buffer(mem, 4096); // preferably another disk/SSD from the fast one fd2 = open("./file"); // loop read(fast_ssd, DIRECT, 512) -> read(fd2, fixed_buf, 4096) Interesting what it'll yield. Probably with buffered reads it can be experimented to have 2 * PAGE_SIZE or even slightly more, to increase the heavy part. btw, I'd look for latency distribution (90%, 99%) as well, it may get the worst hit. >> >> One thing is that Jens specifically wanted tw's to >> be in FIFO order, where IRQ based will be in LIFO. >> I don't think it's a real problem though, the >> completion handler should be brief enough.In my latest code, the IRQ based tw are also FIFO, > only LIFO between IRQ based tw and other tw: > timeline: tw1 tw2 irq1 irq2 > task_list: irq1 irq2 tw1 tw2 >> -- Pavel Begunkov