On Tue, Mar 01, 2022 at 05:30:41AM +0530, Anshuman Khandual wrote: > On 2/28/22 4:27 PM, Russell King (Oracle) wrote: > > On Mon, Feb 28, 2022 at 04:17:32PM +0530, Anshuman Khandual wrote: > >> This defines and exports a platform specific custom vm_get_page_prot() via > >> subscribing ARCH_HAS_VM_GET_PAGE_PROT. Subsequently all __SXXX and __PXXX > >> macros can be dropped which are no longer needed. > > > > What I would really like to know is why having to run _code_ to work out > > what the page protections need to be is better than looking it up in a > > table. > > > > Not only is this more expensive in terms of CPU cycles, it also brings > > additional code size with it. > > > > I'm struggling to see what the benefit is. > > Currently vm_get_page_prot() is also being _run_ to fetch required page > protection values. Although that is being run in the core MM and from a > platform perspective __SXXX, __PXXX are just being exported for a table. > Looking it up in a table (and applying more constructs there after) is > not much different than a clean switch case statement in terms of CPU > usage. So this is not more expensive in terms of CPU cycles. I disagree. However, let's base this disagreement on some evidence. Here is the present 32-bit ARM implementation: 00000048 <vm_get_page_prot>: 48: e200000f and r0, r0, #15 4c: e3003000 movw r3, #0 4c: R_ARM_MOVW_ABS_NC .LANCHOR1 50: e3403000 movt r3, #0 50: R_ARM_MOVT_ABS .LANCHOR1 54: e7930100 ldr r0, [r3, r0, lsl #2] 58: e12fff1e bx lr That is five instructions long. Please show that your new implementation is not more expensive on 32-bit ARM. Please do so by building a 32-bit kernel, and providing the disassembly. I think you will find way more than five instructions in your version - the compiler will have to issue code to decode the protection bits, probably using a table of branches or absolute PC values, or possibly the worst case of using multiple comparisons and branches. It then has to load constants that may be moved using movw on ARMv7, but on older architectures would have to be created from multiple instructions or loaded from the literal pool. Then there'll be instructions to load the address of "user_pgprot", retrieve its value, and bitwise or that. Therefore, I fail to see how your approach of getting rid of the table is somehow "better" than what we currently have in terms of the effect on the resulting code. If you don't like the __P and __S stuff and two arch_* hooks, you could move the table into arch code along with vm_get_page_prot() without the additional unnecessary hooks, while keeping all the benefits of the table lookup. Thanks. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!