RE: [PATCH v6] mm: Uninline copy_overflow()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Christophe Leroy
> Sent: 14 February 2022 13:21
> 
> Le 14/02/2022 à 12:31, David Laight a écrit :
> > From: Anshuman Khandual
> >> Sent: 14 February 2022 09:54
> > ...
> >>> With -Winline, GCC tells:
> >>>
> >>> 	/include/linux/thread_info.h:212:20: warning: inlining failed in call to 'copy_overflow': call
> >> is unlikely and code size would grow [-Winline]
> >>>
> >>> copy_overflow() is a non conditional warning called by
> >>> check_copy_size() on an error path.
> >>>
> >>> check_copy_size() have to remain inlined in order to benefit
> >>> from constant folding, but copy_overflow() is not worth inlining.
> >>>
> >>> Uninline the warning when CONFIG_BUG is selected.
> >>>
> >>> When CONFIG_BUG is not selected, WARN() does nothing so skip it.
> >>>
> >>> This reduces the size of vmlinux by almost 4kbytes.
> >>
> >
> >>> +void __copy_overflow(int size, unsigned long count);
> >>> +
> >>>   static inline void copy_overflow(int size, unsigned long count)
> >>>   {
> >>> -	WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count);
> >>> +	if (IS_ENABLED(CONFIG_BUG))
> >>> +		__copy_overflow(size, count);
> >>>   }
> >
> >> Just wondering, is this the only such scenario which results in
> >> an avoidable bloated vmlinux image ?
> >
> > The more interesting question is whether the call to __copy_overflow()
> > is actually significantly smaller than the one to WARN()?
> > And if so why.
> >
> unsigned long tst_copy_to_user(void __user *to, unsigned long n)
> {
> 	return copy_to_user(to, &jiffies_64, n);
> }
> 
> With the patch:
> 
> 00003c78 <tst_copy_to_user>:
>      3c78:	28 04 00 08 	cmplwi  r4,8
>      3c7c:	7c 85 23 78 	mr      r5,r4
>      3c80:	41 81 00 10 	bgt     3c90 <tst_copy_to_user+0x18>
>      3c84:	3c 80 00 00 	lis     r4,0
> 			3c86: R_PPC_ADDR16_HA	jiffies_64
>      3c88:	38 84 00 00 	addi    r4,r4,0
> 			3c8a: R_PPC_ADDR16_LO	jiffies_64
>      3c8c:	48 00 00 00 	b       3c8c <tst_copy_to_user+0x14>
> 			3c8c: R_PPC_REL24	_copy_to_user
> 
>      3c90:	94 21 ff f0 	stwu    r1,-16(r1)
>      3c94:	7c 08 02 a6 	mflr    r0
>      3c98:	38 60 00 08 	li      r3,8
>      3c9c:	90 01 00 14 	stw     r0,20(r1)
>      3ca0:	90 81 00 08 	stw     r4,8(r1)
>      3ca4:	48 00 00 01 	bl      3ca4 <tst_copy_to_user+0x2c>
> 			3ca4: R_PPC_REL24	__copy_overflow
>      3ca8:	80 a1 00 08 	lwz     r5,8(r1)
>      3cac:	80 01 00 14 	lwz     r0,20(r1)
>      3cb0:	7c a3 2b 78 	mr      r3,r5
>      3cb4:	7c 08 03 a6 	mtlr    r0
>      3cb8:	38 21 00 10 	addi    r1,r1,16
>      3cbc:	4e 80 00 20 	blr
> 
> 
> Without the patch:
> 
> 00003c88 <tst_copy_to_user>:
>      3c88:	28 04 00 08 	cmplwi  r4,8
>      3c8c:	7c 85 23 78 	mr      r5,r4
>      3c90:	41 81 00 10 	bgt     3ca0 <tst_copy_to_user+0x18>
>      3c94:	3c 80 00 00 	lis     r4,0
> 			3c96: R_PPC_ADDR16_HA	jiffies_64
>      3c98:	38 84 00 00 	addi    r4,r4,0
> 			3c9a: R_PPC_ADDR16_LO	jiffies_64
>      3c9c:	48 00 00 00 	b       3c9c <tst_copy_to_user+0x14>
> 			3c9c: R_PPC_REL24	_copy_to_user
> 
>      3ca0:	94 21 ff f0 	stwu    r1,-16(r1)
>      3ca4:	3c 60 00 00 	lis     r3,0
> 			3ca6: R_PPC_ADDR16_HA	.rodata.str1.4+0x30
>      3ca8:	90 81 00 08 	stw     r4,8(r1)
>      3cac:	7c 08 02 a6 	mflr    r0
>      3cb0:	38 63 00 00 	addi    r3,r3,0
> 			3cb2: R_PPC_ADDR16_LO	.rodata.str1.4+0x30
>      3cb4:	38 80 00 08 	li      r4,8
>      3cb8:	90 01 00 14 	stw     r0,20(r1)
>      3cbc:	48 00 00 01 	bl      3cbc <tst_copy_to_user+0x34>
> 			3cbc: R_PPC_REL24	__warn_printk
>      3cc0:	80 a1 00 08 	lwz     r5,8(r1)
>      3cc4:	0f e0 00 00 	twui    r0,0
>      3cc8:	80 01 00 14 	lwz     r0,20(r1)
>      3ccc:	7c a3 2b 78 	mr      r3,r5
>      3cd0:	7c 08 03 a6 	mtlr    r0
>      3cd4:	38 21 00 10 	addi    r1,r1,16
>      3cd8:	4e 80 00 20 	blr

I make that 3 extra instructions.
Two are needed to load the format string.
Not sure why the third gets added.

Not really significant in the 12-15 the error call actually takes.
Although a lot of those are just generating the stack frame
in order to call the error function - and wouldn't be there in
a less trivial example.

More interesting would be changing copy_overflow() to return the size.
So copy_to_user() becomes:

	if (size_valid())
		return _copy_to_user();
	return copy_overflow()

In your example that would generate a tail call in the error path.
It also avoids having to save the transfer length.

Plausibly you'll get smaller code by making the prototypes
of _copy_to_to_user() and copy_overflow() match.
But compilers don't like generating the:
	(cond ? a : b)(args)
assembler that would really be needed.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux