Re: [PATCH 1/2] asm-generic/atomic: Prefer __always_inline for wrappers

Marco Elver <elver@xxxxxxxxxx> · Tue, 26 Nov 2019 12:46:19 +0100

On Mon, 25 Nov 2019 at 19:39, Mark Rutland <mark.rutland@xxxxxxx> wrote:
>
> On Mon, Nov 25, 2019 at 07:22:33PM +0100, Marco Elver wrote:
> > On Mon, 25 Nov 2019 at 18:38, Mark Rutland <mark.rutland@xxxxxxx> wrote:
> > >
> > > On Fri, Nov 22, 2019 at 04:42:20PM +0100, Marco Elver wrote:
> > > > Prefer __always_inline for atomic wrappers. When building for size
> > > > (CC_OPTIMIZE_FOR_SIZE), some compilers appear to be less inclined to
> > > > inline even relatively small static inline functions that are assumed to
> > > > be inlinable such as atomic ops. This can cause problems, for example in
> > > > UACCESS regions.
> > >
> > > From looking at the link below, the problem is tat objtool isn't happy
> > > about non-whiteliested calls within UACCESS regions.
> > >
> > > Is that a problem here? are the kasan/kcsan calls whitelisted?
> >
> > We whitelisted all the relevant functions.
> >
> > The problem it that small static inline functions private to the
> > compilation unit do not get inlined when CC_OPTIMIZE_FOR_SIZE=y (they
> > do get inlined when CC_OPTIMIZE_FOR_PERFORMANCE=y).
> >
> > For the runtime this is easy to fix, by just making these small
> > functions __always_inline (also avoiding these function call overheads
> > in the runtime when CC_OPTIMIZE_FOR_SIZE).
> >
> > I stumbled upon the issue for the atomic ops, because the runtime uses
> > atomic_long_try_cmpxchg outside a user_access_save() region (and it
> > should not be moved inside). Essentially I fixed up the runtime, but
> > then objtool still complained about the access to
> > atomic64_try_cmpxchg. Hence this patch.
> >
> > I believe it is the right thing to do, because the final inlining
> > decision should *not* be made by wrappers. I would think this patch is
> > the right thing to do irrespective of KCSAN or not.
>
> Given the wrappers are trivial, and for !KASAN && !KCSAN, this would
> make them equivalent to the things they wrap, that sounds fine to me.
>
> > > > By using __always_inline, we let the real implementation and not the
> > > > wrapper determine the final inlining preference.
> > >
> > > That sounds reasonable to me, assuming that doesn't end up significantly
> > > bloating the kernel text. What impact does this have on code size?
> >
> > It actually seems to make it smaller.
> >
> > x86 tinyconfig:
> > - vmlinux baseline: 1316204
> > - vmlinux with patches: 1315988 (-216 bytes)
>
> Great! Fancy putting that in the commit message?

Done.

> > > > This came up when addressing UACCESS warnings with CC_OPTIMIZE_FOR_SIZE
> > > > in the KCSAN runtime:
> > > > http://lkml.kernel.org/r/58708908-84a0-0a81-a836-ad97e33dbb62@xxxxxxxxxxxxx
> > > >
> > > > Reported-by: Randy Dunlap <rdunlap@xxxxxxxxxxxxx>
> > > > Signed-off-by: Marco Elver <elver@xxxxxxxxxx>
> > > > ---
> > > >  include/asm-generic/atomic-instrumented.h | 334 +++++++++++-----------
> > > >  include/asm-generic/atomic-long.h         | 330 ++++++++++-----------
> > > >  scripts/atomic/gen-atomic-instrumented.sh |   6 +-
> > > >  scripts/atomic/gen-atomic-long.sh         |   2 +-
> > > >  4 files changed, 336 insertions(+), 336 deletions(-)
> > >
> > > Do we need to do similar for gen-atomic-fallback.sh and the fallbacks
> > > defined in scripts/atomic/fallbacks/ ?
> >
> > I think they should be, but I think that's debatable. Some of them do
> > a little more than just wrap things. If we want to make this
> > __always_inline, I would do it in a separate patch independent from
> > this series to not stall the fixes here.
>
> I would expect that they would suffer the same problem if used in a
> UACCESS region, so if that's what we're trying to fix here, I think that
> we need to do likewise there.
>
> The majority are trivial wrappers (shuffling arguments or adding trivial
> barriers), so those seem fine. The rest call things that we're inlining
> here.
>
> Would you be able to give that a go?

Done in v2.

> > > > diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh
> > > > index 8b8b2a6f8d68..68532d4f36ca 100755
> > > > --- a/scripts/atomic/gen-atomic-instrumented.sh
> > > > +++ b/scripts/atomic/gen-atomic-instrumented.sh
> > > > @@ -84,7 +84,7 @@ gen_proto_order_variant()
> > > >       [ ! -z "${guard}" ] && printf "#if ${guard}\n"
> > > >
> > > >  cat <<EOF
> > > > -static inline ${ret}
> > > > +static __always_inline ${ret}
> > >
> > > We should add an include of <linux/compiler.h> to the preamble if we're
> > > explicitly using __always_inline.
> >
> > Will add in v2.
> >
> > > > diff --git a/scripts/atomic/gen-atomic-long.sh b/scripts/atomic/gen-atomic-long.sh
> > > > index c240a7231b2e..4036d2dd22e9 100755
> > > > --- a/scripts/atomic/gen-atomic-long.sh
> > > > +++ b/scripts/atomic/gen-atomic-long.sh
> > > > @@ -46,7 +46,7 @@ gen_proto_order_variant()
> > > >       local retstmt="$(gen_ret_stmt "${meta}")"
> > > >
> > > >  cat <<EOF
> > > > -static inline ${ret}
> > > > +static __always_inline ${ret}
> > >
> > > Likewise here
> >
> > Will add in v2.
>
> Great; thanks!

Sent v2: http://lkml.kernel.org/r/20191126114121.85552-1-elver@xxxxxxxxxx

Thanks,
-- Marco