Re: [PATCHSET v6b 0/11] Turn single segment imports into ITER_UBUF

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Apr 2, 2023 at 3:22 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>
> Linus, are you going to turn this into a proper patch? This is too
> good to not pursue.

So I initially was planning on doing just that for 6.4.

However, I looked more at it, and I'm now fairly convinced that the
biggest problem is that we have basically screwed up our simple
"copy_to/from_user()" with all the indirection.

So yes, that patch may end up being a 6% improvement on the made-up
load, but a *large* reason for that is that we just do horribly badly
on a plain "copy_from_user()", and I think I could fix that.

The problems with "copy_from_user()" are:

 - it's *two* function calls ("_copy_to_user()" and then "raw_copy_to_user()")

 - we have entirely lost all "this is how big the copy is" at all levels

and my stupid patch basically improves on copy_iovec_from_user() by
fixing those two issues:

 - it inlines it all by using the "unsafe_get_user()" interfaces

 - it recovers the access sizes by just accessing the fields directly

and in the process it gets rid of us being really really piggy in
"copy_from_user()".

Now, there are a few reason *why* we are so piggy in copy_from_user(), mainly

 (a) CLAC/STAC is just slow and 'access_ok()' is big

 (b) we have lots of debug boiler plate crap. Things like
      - might_fault() (PROVE_LOCKING || DEBUG_ATOMIC_SLEEP)
      - check_copy_size() (HARDENED_USERCOPY)
      - test should_fail_usercopy() (FAULT_INJECTION_USERCOPY)
      - do instrument_copy_from_user_before/after (KASAN)
      - clear the end of the buffer on failures (legacy behavior)

 (c) you probably don't have a CPU with X86_FEATURE_FSRM

Now, what (a) and (b) does is to make it unreasonable to inline
copy_from/to_user(). Particularly when those debug options are set,
but even without them, it's just not reasonable to inline.

Long long ago, we used to special-case small constant-sized user
copies, but all of that has gone away. That's lovely (it used to be
horrid in the header files, and caused problems)

And (c) means that the small cases tend to suck, although with all the
other overhead (two function calls, possibly with return stack
counting, CLAC/STAC, 'lfence' for speculation control etc etc), that's
almost not an issue. It's just extra cycles.

Again, that hack to copy_iovec_from_user() ends up working so well
simply because it avoids all the *stupid* stuff. Yes, it still
obviously does the CLAC/STAC and the access_ok(), but it's ok to
inline when it's just that special code, not some random
'copy_from_user()' that doesn't matter.

Anyway, what this long rant is about is that I'm looking at what I can
do to improve "copy_from_user()" instead. It's a pain, because of all
those debug options, but I actually have a disgusting patch that would
make it possible to have a much better copy_from_user.

How disgusting, you say? Let me quote part of it:

  +#define __a(n,a) ((unsigned long)(n)&((a)-1))
  +#define statically_aligned(n,a) \
  +       (__builtin_constant_p(__a(n,a)) && !__a(n,a))
  +
  +extern unsigned long copy_from_user_a16(void *to, const void __user
*from, unsigned long n);
  +extern unsigned long copy_from_user_a8(void *to, const void __user
*from, unsigned long n);
  +extern unsigned long copy_from_user_a4(void *to, const void __user
*from, unsigned long n);
  ...
  +               if (statically_aligned(n, 16))
  +                       n = copy_from_user_a16(to, from, n);
  +               else if (statically_aligned(n, 8))
  +                       n = copy_from_user_a8(to, from, n);
  +               else if (statically_aligned(n, 4))
  +                       n = copy_from_user_a4(to, from, n);
  +               else
  +                       n = _copy_from_user(to, from, n);

and it turns out that both gcc and clang are smart enough that even
when you don't have a *constant* size, when you have code like

        if (copy_from_user(iov, uvec, nr_segs * sizeof(*uvec)))
                return -EFAULT;

that "statically_aligned()" thing will notice that "look, that size is
a multiple of 16", and my disgusting patch replaces the "I have lost
all sign of the size of the copy" user copy with a call to
copy_from_user_a16().

And interestingly, that "size is 16-byte aligned" happens a *lot*. It
happens for that uvec case, but it also happens for things like
"cp_new_stat()" that copies a "struct stat" to user space, because
'struct stat' is something like 144 bytes (9*16) on x86-64.

So yes, I can improve the iovec copy. Easy peasy. Get rid of the crazy
overhead from a generic interface and from our disgusting debugging
code.

Or I could try to improve copy_to/from_user() in general.

My current patch is too ugly to actually live, and making it even
better (encode the size range when it is statically known, not just a
few "size is X-byte aligned") would make it uglier still.

I'm idly thinking about trying to teach 'objtool' to pick the right
function target by hiding the size information in a separate section.

I may decide that "just doing iovecs" is just so much simpler, and
that trying to improve the generic case is too painful. But that's the
issue right now - I know I *could* just improve copy_from_user()
enough that at least with sane configurations, that iovec improvement
would basically not be worth it.

The problem here is at least partly "sane configurations". At least
Fedora enables HARDENED_USERCOPY. And if you have that enabled, then
"copy_to/from_user()" is _always_ going to suck, and doing a special
"copy iovecs by avoiding it" is always going to be better, if only
because you avoid the debug overhead.

Argh.

             Linus




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux