Re: [PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor

"Madhavan T. Venkataraman" <madvenka@xxxxxxxxxxxxxxxxxxx> · Tue, 28 Jul 2020 12:39:59 -0500

    On 7/28/20 12:16 PM, Andy Lutomirski
      wrote:

      On Tue, Jul 28, 2020 at 9:32 AM Madhavan T. Venkataraman
<madvenka@xxxxxxxxxxxxxxxxxxx> wrote:

        Thanks. See inline..

On 7/28/20 10:13 AM, David Laight wrote:

          From:  madvenka@xxxxxxxxxxxxxxxxxxx

            Sent: 28 July 2020 14:11

          ...

            The kernel creates the trampoline mapping without any permissions. When
the trampoline is executed by user code, a page fault happens and the
kernel gets control. The kernel recognizes that this is a trampoline
invocation. It sets up the user registers based on the specified
register context, and/or pushes values on the user stack based on the
specified stack context, and sets the user PC to the requested target
PC. When the kernel returns, execution continues at the target PC.
So, the kernel does the work of the trampoline on behalf of the
application.

          Isn't the performance of this going to be horrid?

        It takes about the same amount of time as getpid(). So, it is
one quick trip into the kernel. I expect that applications will
typically not care about this extra overhead as long as
they are able to run.

      What did you test this on?  A page fault on any modern x86_64 system
is much, much, much, much slower than a syscall.

    I tested it in on a KVM guest running Ubuntu. So, when you say

    that a page fault is much slower, do you mean a regular page

    fault that is handled through the VM layer? Here is the relevant
    code

    in do_user_addr_fault():

            if
        (unlikely(access_error(hw_error_code, vma))) {

                      /*

                       * If it is a user execute fault,
        it could be a trampoline

                       * invocation.

                       */

                      if ((hw_error_code & tflags)
        == tflags &&

                          trampfd_fault(vma, regs)) {

        up_read(&mm->mmap_sem);

                              return;

                      }

                      bad_area_access_error(regs,
        hw_error_code, address, vma);

                      return;

              }

              /*

               * If for any reason at all we couldn't
        handle the fault,

               * make sure we exit gracefully rather
        than endlessly redo

               * the fault.  Since we never set
        FAULT_FLAG_RETRY_NOWAIT, if

               * we get VM_FAULT_RETRY back, the
        mmap_sem has been unlocked.

               *

               * Note that handle_userfault() may also
        release and reacquire mmap_sem

               * (and not return with VM_FAULT_RETRY),
        when returning to userland to

               * repeat the page fault later with a
        VM_FAULT_NOPAGE retval

               * (potentially after handling any pending
        signal during the return to

               * userland). The return to userland is
        identified whenever

               * FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are
        both set in flags.

               */

              fault = handle_mm_fault(vma, address,
        flags);

    trampfd faults are instruction faults that go through a different
    code

    path than the one that calls handle_mm_fault().

    Could you clarify?

    Thanks.

    Madhavan