Re: [PATCH RESEND] io_uring: a small optimization for REQ_F_DRAIN_LINK

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/22/2019 1:26 PM, JackieLiu wrote:
>> Not sure about that. It's 1 CMP + 1 SETcc/STORE, which works pretty fast
>> as @drain_next is hot (especially after read) and there is no write-read
>> dependency close. For yours, there is likely always 3 CMPs in the way.
>>
>> Did you benchmarked it somehow or compared assembly?
> 
> It is only theoretically possible. In most cases, our drain_link 
> and drain_next are both false, so only two CMPs are needed, and modern CPUs
> have branch predictions. Perhaps these judgments can be optimized.
> 
My bad, right, 2 CMPs in the common way.

> Your code is very nice, when I reading and understanding your code,
> I want to try if there is any other way to optimize it. 
> 
> Sometimes you don't need to reset drain_next, such as drain_link == true && 
> drain_next == true, you don't need to set below one more time.

We may think to change like below, but I'd rather rely on a compiler to
optimise it for us (i.e. knowing the target architecture). Everything
else is a really rare/slow path in my opinion, so shouldn't be of concern.

-	req->ctx->drain_next = (req->flags & REQ_F_DRAIN_LINK);
+	if (req->flags & REQ_F_DRAIN_LINK)
+		req->ctx->drain_next = true;

If the goal is to micro-optimise things, it's better to think how to
toss the whole scheme to reduce number of CMPs and memory read/writes in
the hot path, including setting REQ_F_DRAIN_LINK in submit_sqe().

Though, there are still heavier things happening around.

-- 
Pavel Begunkov



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux