From: Waiman Long > Sent: 03 May 2024 17:00 > To: David Laight <David.Laight@xxxxxxxxxx>; 'linux-kernel@xxxxxxxxxxxxxxx' <linux- > kernel@xxxxxxxxxxxxxxx>; 'peterz@xxxxxxxxxxxxx' <peterz@xxxxxxxxxxxxx> > Cc: 'mingo@xxxxxxxxxx' <mingo@xxxxxxxxxx>; 'will@xxxxxxxxxx' <will@xxxxxxxxxx>; 'boqun.feng@xxxxxxxxx' > <boqun.feng@xxxxxxxxx>; 'Linus Torvalds' <torvalds@xxxxxxxxxxxxxxxxxxxx>; 'virtualization@lists.linux- > foundation.org' <virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx>; 'Zeng Heng' <zengheng4@xxxxxxxxxx> > Subject: Re: [PATCH next v2 5/5] locking/osq_lock: Optimise decode_cpu() and per_cpu_ptr(). > > > On 12/31/23 23:14, Waiman Long wrote: > > > > On 12/31/23 16:55, David Laight wrote: > >> per_cpu_ptr() indexes __per_cpu_offset[] with the cpu number. > >> This requires the cpu number be 64bit. > >> However the value is osq_lock() comes from a 32bit xchg() and there > >> isn't a way of telling gcc the high bits are zero (they are) so > >> there will always be an instruction to clear the high bits. > >> > >> The cpu number is also offset by one (to make the initialiser 0) > >> It seems to be impossible to get gcc to convert > >> __per_cpu_offset[cpu_p1 - 1] > >> into (__per_cpu_offset - 1)[cpu_p1] (transferring the offset to the > >> address). > >> > >> Converting the cpu number to 32bit unsigned prior to the decrement means > >> that gcc knows the decrement has set the high bits to zero and doesn't > >> add a register-register move (or cltq) to zero/sign extend the value. > >> > >> Not massive but saves two instructions. > >> > >> Signed-off-by: David Laight <david.laight@xxxxxxxxxx> > >> --- > >> kernel/locking/osq_lock.c | 6 ++---- > >> 1 file changed, 2 insertions(+), 4 deletions(-) > >> > >> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c > >> index 35bb99e96697..37a4fa872989 100644 > >> --- a/kernel/locking/osq_lock.c > >> +++ b/kernel/locking/osq_lock.c > >> @@ -29,11 +29,9 @@ static inline int encode_cpu(int cpu_nr) > >> return cpu_nr + 1; > >> } > >> -static inline struct optimistic_spin_node *decode_cpu(int > >> encoded_cpu_val) > >> +static inline struct optimistic_spin_node *decode_cpu(unsigned int > >> encoded_cpu_val) > >> { > >> - int cpu_nr = encoded_cpu_val - 1; > >> - > >> - return per_cpu_ptr(&osq_node, cpu_nr); > >> + return per_cpu_ptr(&osq_node, encoded_cpu_val - 1); > >> } > >> /* > > > > You really like micro-optimization. > > > > Anyway, > > > > Reviewed-by: Waiman Long <longman@xxxxxxxxxx> > > > David, > > Could you respin the series based on the latest upstream code? Looks like a wet bank holiday weekend..... David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)