On Thu, Dec 07, 2023 at 04:35:47PM +0300, Serge Semin wrote: > Hi Andrew > > On Wed, Dec 06, 2023 at 06:01:30PM +0100, Andrew Lunn wrote: > > The compiler does a better job at deciding what to inline than we > > humans do. If you can show the compiler is doing it wrong, then we > > might accept them. > > In general I would have agreed with you especially if the methods were > heavier than what they are: > static inline ptrdiff_t dw_xpcs_mmio_addr_format(int dev, int reg) > { > return FIELD_PREP(0x1f0000, dev) | FIELD_PREP(0xffff, reg); > } > > static inline u16 dw_xpcs_mmio_addr_page(ptrdiff_t csr) > { > return FIELD_GET(0x1fff00, csr); > } > > static inline ptrdiff_t dw_xpcs_mmio_addr_offset(ptrdiff_t csr) > { > return FIELD_GET(0xff, csr); > } > > > But in general, netdev does not like inline in .C > > file. > > I see. I'll do as you say if you don't change your mind after my > reasoning below. This isn't Andrew saying it - you seem to have missed the detail that "netdev". If Andrew doesn't say it, then DaveM, Jakub or Paolo will. Have you read the "Inline functions" section in Documentation/process/4.Coding.rst ? > > Also, nothing in MDIO is hot path, it spends a lot of time > > waiting for a slow bus. So inline is likely to just bloat the code for > > no gain. > > I would have been absolutely with you in this matter, if we were talking > about a normal MDIO bus. In this case the devices are accessed over > the system IO-memory. So the bus isn't that slow. > > Regarding the compiler knowing better when to inline the code. Here is > what it does with the methods above. If the inline keyword is > specified the compiler will inline all three methods. If the keyword isn't > specified then dw_xpcs_mmio_addr_format() won't be inlined while the rest > two functions will be. So the only part at consideration is the > dw_xpcs_mmio_addr_format() method since the rest of the functions are > inlined anyway. > > The dw_xpcs_mmio_addr_format() function body is of the 5 asm > instructions length (on MIPS). Since the function call in this case > requires two jump instructions (to function and back), one instruction > to save the previous return address on stack and two instructions for > the function arguments, the trade-off of having non-inlined function > are those five additional instructions on each call. There are four > dw_xpcs_mmio_addr_format() calls. So here is what we get in both > cases: > inlined: 5 func ins * 4 calls = 20 ins > non-inlined: (5 func + 1 jump) ins + (1 jump + 1 ra + 2 arg) ins * 4 calls = 22 ins > but seeing the return address needs to be saved anyway in the callers > here is what we finally get: > non-inlined: (5 func + 1 jump) ins + (1 jump + 2 arg) ins * 4 calls = 18 ins > > So unless I am mistaken in some of the aspects if we have the function > non-inlined then we'll save 2 instructions in the object file, but > each call would require additional _4_ instructions to execute (2 > jumps and 2 arg creations), which makes the function execution almost > two times longer than it would have been should it was inlined. Rather than just focusing on instruction count, you also need to consider things like branch prediction, prefetching and I-cache usage. Modern CPUs don't execute instruction-by-instruction anymore. It is entirely possible that the compiler is making better choices even if it results in more jumps in the code. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!