On 8/18/21 1:22 PM, Hao Xu wrote: > 在 2021/8/18 下午6:20, Pavel Begunkov 写道: >> On 8/18/21 8:43 AM, Hao Xu wrote: >>> Given a linkchain like this: >>> req0(link_flag)-->req1(link_flag)-->...-->reqn(no link_flag) >>> [...] >>> struct io_submit_link *link = &ctx->submit_state.link; >>> + bool is_link = sqe->flags & (IOSQE_IO_LINK | IOSQE_IO_HARDLINK); >>> + struct io_kiocb *head; >>> int ret; >>> + /* >>> + * we don't update link->last until we've done io_req_prep() >>> + * since linked timeout uses old link->last >>> + */ >>> + if (link->head) >>> + link->last->link = req; >>> + else if (is_link) >>> + link->head = req; >>> + head = link->head; >> >> It's a horrorsome amount of overhead. How about to set the fail flag >> if failed early and actually fail on io_queue_sqe(), as below. It's >> not tested and a couple more bits added, but hopefully gives the idea. > I get the idea, it's truely with less change. But why do you think the > above code bring in more overhead, since anyway we have to link the req > to the linkchain. I tested it with fio-direct-4k-read-with/without-sqpoll, didn't see performance regression. Well, it's an exaggeration :) However, we were cutting the overhead, and there is no atomics or other heavy operations left in the hot path, just pure number of instructions a request should go through. That's just to clear the reason why I don't want extras on the path. For the non-linked path, first it adds 2 ifs in front and removes one at the end. Then there is is_link, which is most likely to be saved on stack. And same with @head which I'd expect to be saved on stack. If we have a way to avoid it, that's great, and it looks we can. [...] >> - ret = io_req_prep(req, sqe); >> - if (unlikely(ret)) >> - goto fail_req; >> /* don't need @sqe from now on */ >> trace_io_uring_submit_sqe(ctx, req, req->opcode, req->user_data, >> @@ -6670,8 +6670,10 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, >> struct io_kiocb *head = link->head; >> > maybe better to add an if(head & FAIL) here, since we don't need to > prep_async if we know it will be cancelled. Such an early fail is marginal enough to not care about performance, but I agree that the check is needed as io_req_prep_async() won't be able to handle it. E.g. if it failed to grab a file. -- Pavel Begunkov