Le 01/08/2022 à 10:18, David Laight a écrit :
From: Christophe JAILLET
Sent: 29 July 2022 21:29
Most of the time the 'min' and 'max' parameters of usleep_range() are
constant. We can take advantage of it to pre-compute at compile time
some values otherwise computer at run-time in usleep_range_state().
Replace usleep_range_state() by a new __nsleep_range_delta_state() function
that takes as parameters the pre-computed values.
The main benefit is to save a few instructions, especially 2
multiplications (x1000 when converting us to ns).
...
53 push %rbx
48 89 fb mov %rdi,%rbx
81 e5 cc 00 00 00 and $0xcc,%ebp
- 49 29 dc sub %rbx,%r12 ; (max - min)
- 4d 69 e4 e8 03 00 00 imul $0x3e8,%r12,%r12 ; us --> ns (x 1000)
48 83 ec 68 sub $0x68,%rsp
48 c7 44 24 08 b3 8a movq $0x41b58ab3,0x8(%rsp)
b5 41
@@ -10721,18 +10719,16 @@
31 c0 xor %eax,%eax
e8 00 00 00 00 call ...
e8 00 00 00 00 call ...
- 49 89 c0 mov %rax,%r8
- 48 69 c3 e8 03 00 00 imul $0x3e8,%rbx,%rax ; us --> ns (x 1000)
+ 48 01 d8 add %rbx,%rax
+ 48 89 44 24 28 mov %rax,0x28(%rsp)
65 48 8b 1c 25 00 00 mov %gs:0x0,%rbx
00 00
- 4c 01 c0 add %r8,%rax
- 48 89 44 24 28 mov %rax,0x28(%rsp)
e8 00 00 00 00 call ...
...
Is that really measurable in any test?
I don't think so, even on 32 bits arch.
Integer multiply is one clock on almost every modern cpu.
By the time you've allowed for superscaler cpu there is
probably no difference at all on anything except the simplest
cpus.
My point is that it is a low hanging fruit.
Just moving some simple computations from one function to another, to
have the compiler do the job instead of at runtime.
I won't argue the value of the patch itself.
I spotted a potential opportunity and proposed a patch for it.
If someone finds it valuable enough, just take it.
If no-one care, just forget about it.
Both alternative are fine for me.
Best regards,
CJ
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)