Re: [PATCH net-next 09/16] net: mdio: Add Synopsys DW XPCS management interface support

"Russell King (Oracle)" <linux@xxxxxxxxxxxxxxx> · Thu, 7 Dec 2023 14:02:27 +0000

On Thu, Dec 07, 2023 at 04:35:47PM +0300, Serge Semin wrote:
> Hi Andrew
> 
> On Wed, Dec 06, 2023 at 06:01:30PM +0100, Andrew Lunn wrote:
> > The compiler does a better job at deciding what to inline than we
> > humans do. If you can show the compiler is doing it wrong, then we
> > might accept them.
> 
> In general I would have agreed with you especially if the methods were
> heavier than what they are:
> static inline ptrdiff_t dw_xpcs_mmio_addr_format(int dev, int reg)
> {               
>         return FIELD_PREP(0x1f0000, dev) | FIELD_PREP(0xffff, reg);
> }               
>         
> static inline u16 dw_xpcs_mmio_addr_page(ptrdiff_t csr)
> {       
>         return FIELD_GET(0x1fff00, csr);
> }       
> 
> static inline ptrdiff_t dw_xpcs_mmio_addr_offset(ptrdiff_t csr)
> {
>         return FIELD_GET(0xff, csr);
> }
> 
> > But in general, netdev does not like inline in .C
> > file.
> 
> I see. I'll do as you say if you don't change your mind after my
> reasoning below.

This isn't Andrew saying it - you seem to have missed the detail that
"netdev". If Andrew doesn't say it, then DaveM, Jakub or Paolo will.

Have you read the "Inline functions" section in
Documentation/process/4.Coding.rst ?

> > Also, nothing in MDIO is hot path, it spends a lot of time
> > waiting for a slow bus. So inline is likely to just bloat the code for
> > no gain.
> 
> I would have been absolutely with you in this matter, if we were talking
> about a normal MDIO bus. In this case the devices are accessed over
> the system IO-memory. So the bus isn't that slow.
> 
> Regarding the compiler knowing better when to inline the code. Here is
> what it does with the methods above. If the inline keyword is
> specified the compiler will inline all three methods. If the keyword isn't
> specified then dw_xpcs_mmio_addr_format() won't be inlined while the rest
> two functions will be. So the only part at consideration is the
> dw_xpcs_mmio_addr_format() method since the rest of the functions are
> inlined anyway.
> 
> The dw_xpcs_mmio_addr_format() function body is of the 5 asm
> instructions length (on MIPS). Since the function call in this case
> requires two jump instructions (to function and back), one instruction
> to save the previous return address on stack and two instructions for
> the function arguments, the trade-off of having non-inlined function
> are those five additional instructions on each call. There are four
> dw_xpcs_mmio_addr_format() calls. So here is what we get in both
> cases:
> inlined:     5 func ins * 4 calls = 20 ins
> non-inlined: (5 func + 1 jump) ins + (1 jump + 1 ra + 2 arg) ins * 4 calls  = 22 ins
> but seeing the return address needs to be saved anyway in the callers
> here is what we finally get:
> non-inlined: (5 func + 1 jump) ins + (1 jump + 2 arg) ins * 4 calls  = 18 ins
> 
> So unless I am mistaken in some of the aspects if we have the function
> non-inlined then we'll save 2 instructions in the object file, but
> each call would require additional _4_ instructions to execute (2
> jumps and 2 arg creations), which makes the function execution almost
> two times longer than it would have been should it was inlined.

Rather than just focusing on instruction count, you also need to
consider things like branch prediction, prefetching and I-cache
usage. Modern CPUs don't execute instruction-by-instruction anymore.

It is entirely possible that the compiler is making better choices
even if it results in more jumps in the code.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!