Re: [PATCH v3] vdso(7): new man page

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Mike,

Thanks for the updated patch.

I've applied your patches for the next man-pages release, but would be happy if
you could answer the questions below.

On 12/31/13 20:41, Mike Frysinger wrote:
> ---
>  man2/syscall.2   |   6 +-
>  man2/syscalls.2  |   3 +-
>  man3/getauxval.3 |   4 +-
>  man7/libc.7      |   5 +-
>  man7/vdso.7      | 457 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 468 insertions(+), 7 deletions(-)
>  create mode 100644 man7/vdso.7
> 
> diff --git a/man2/syscall.2 b/man2/syscall.2
> index e712b41..fe5f86d 100644
> --- a/man2/syscall.2
> +++ b/man2/syscall.2
> @@ -145,7 +145,8 @@ The details for various architectures are listed in the two tables below.
>  
>  The first table lists the instruction used to transition to kernel mode,
>  (which might not be the fastest or best way to transition to the kernel,
> -so you might have to refer to the VDSO),
> +so you might have to refer to
> +.BR vdso (7)),
>  the register used to indicate the system call number,
>  and the register used to return the system call result.
>  .if t \{\
> @@ -219,4 +220,5 @@ main(int argc, char *argv[])
>  .SH SEE ALSO
>  .BR _syscall (2),
>  .BR intro (2),
> -.BR syscalls (2)
> +.BR syscalls (2),
> +.BR vdso (7)
> diff --git a/man2/syscalls.2 b/man2/syscalls.2
> index 265c654..0d085e1 100644
> --- a/man2/syscalls.2
> +++ b/man2/syscalls.2
> @@ -833,4 +833,5 @@ and similarly
>  .SH SEE ALSO
>  .BR syscall (2),
>  .BR unimplemented (2),
> -.BR libc (7)
> +.BR libc (7),
> +.BR vdso (7)
> diff --git a/man3/getauxval.3 b/man3/getauxval.3
> index 8f27932..09d5bdc 100755
> --- a/man3/getauxval.3
> +++ b/man3/getauxval.3
> @@ -210,7 +210,5 @@ see
>  for more information.
>  .SH SEE ALSO
>  .BR secure_getenv (3),
> +.BR vdso (7),
>  .BR ld-linux.so (8)
> -
> -The kernel source file
> -.IR Documentation/ABI/stable/vdso
> diff --git a/man7/libc.7 b/man7/libc.7
> index a9aeba2..f687ced 100644
> --- a/man7/libc.7
> +++ b/man7/libc.7
> @@ -98,6 +98,9 @@ Details of these libraries are generally not covered by the
>  project.
>  .SH SEE ALSO
>  .BR syscalls (2),
> +.BR getauxval (3),
> +.BR proc (5),
>  .BR feature_test_macros (7),
>  .BR man-pages (7),
> -.BR standards (7)
> +.BR standards (7),
> +.BR vdso (7)
> diff --git a/man7/vdso.7 b/man7/vdso.7
> new file mode 100644
> index 0000000..3c4b7fb
> --- /dev/null
> +++ b/man7/vdso.7
> @@ -0,0 +1,457 @@
> +.\" Written by Mike Frysinger <vapier@xxxxxxxxxx>
> +.\"
> +.\" %%%LICENSE_START(PUBLIC_DOMAIN)
> +.\" This page is in the public domain.
> +.\" %%%LICENSE_END
> +.\"
> +.TH VDSO 7 2013-04-09 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +vDSO \- overview of the virtual ELF dynamic shared object
> +.SH SYNOPSIS
> +.B #include <sys/auxv.h>
> +
> +.B void *vdso = (uintptr_t) getauxval(AT_SYSINFO_EHDR);
> +.SH DESCRIPTION
> +The "vDSO" is a small shared library that the kernel automatically maps into the
> +address space of all user-space applications.
> +Applications themselves usually need not concern themselves with these details
> +as the vDSO is most commonly called by the C library.
> +This way you can write using standard functions and the C library will take care

After "write" I added "programs". Okay?

> +of using any available functionality.

I made this piece:

    of using any functionality that is available via the vDSO.

Okay?

> +
> +Why does the vDSO exist at all?
> +There are some facilities the kernel provides that user space ends up using

I changed "facilities" to "system calls". Okay?

> +frequently to the point that such calls can dominate overall performance.
> +This is due both to the frequency of the call as well as the context overhead
> +from exiting user space and entering the kernel.
> +
> +The rest of this documentation is geared towards the curious and/or C library
> +writers rather than general developers.
> +If you're trying to call the vDSO in your own application rather than using
> +the C library, you're most likely doing it wrong.
> +.SS Example background
> +Making system calls can be slow.
> +In x86 32-bit systems, you can trigger a software interrupt (int $0x80) to tell
> +the kernel you wish to make a system call.
> +However, this instruction is expensive: it goes through the full interrupt
> +handling paths in the processor's microcode as well as in the kernel.
> +Newer processors have faster (but backwards incompatible) instructions to
> +initiate system calls.
> +Rather than require the C library to figure out if this functionality is
> +available at runtime itself, it can use functions provided by the kernel in
> +the vDSO.
> +
> +Note that the terminology can be confusing.
> +On x86 systems, the vDSO function is named "__kernel_vsyscall", but on x86_64,

After "function" I added

    used to determine the preferred method of making a system call is

Okay?

> +the term "vsyscall" also refers to an obsolete way to ask the kernel what time
> +it is or what CPU the caller is on.
> +
> +One system call frequently called is gettimeofday().
> +This is called both directly by user-space applications as well as indirectly by
> +the C library.
> +Think timestamps or timing loops or polling -- all of these frequently need to
> +know what time it is right now.
> +This information is also not secret -- any application in any privilege mode
> +(root or any user) will get the same answer.
> +Thus the kernel arranges for the information required to answer this question
> +to be placed in memory the process can access.
> +Now a call to gettimeofday() changes from a system call to a normal function
> +call and a few memory accesses.
> +.SS Finding the vDSO
> +The base address of the vDSO (if one exists) is passed by the kernel to each
> +program in the initial auxiliary vector.
> +Specifically, via the
> +.B AT_SYSINFO_EHDR
> +tag.
> +
> +You must not assume the vDSO is mapped at any particular location in the
> +user's memory map.
> +The base address will usually be randomized at runtime every time a new
> +process image is created (at
> +.BR execve (2)
> +time).
> +This is done for security reasons to prevent standard "return-to-libc" attacks.
> +
> +For some architectures, there is also a
> +.B AT_SYSINFO
> +tag.
> +This is used only for locating the vsyscall entry point and is frequently
> +omitted or set to 0 (meaning it's not available).
> +It is a throwback to the initial vDSO work (see
> +.IR HISTORY
> +below) and should be avoided.
> +
> +Refer to
> +.BR getauxval (3)
> +for more details on accessing these fields.
> +.SS File format
> +Since the vDSO is a fully formed ELF image, you can do symbol lookups on it.
> +This allows new symbols to be added with newer kernel releases, and for the
> +C library to detect available functionality at runtime when running under
> +different kernel versions.
> +Often times the C library will do detection with the first call and then
> +cache the result for subsequent calls.
> +
> +All symbols are also versioned (using the GNU version format).
> +This allows the kernel to update the function signature without breaking
> +backwards compatibility.
> +This means changing the arguments that the function accepts as well as the
> +return value.
> +Thus, when looking up a symbol in the vDSO, you must always include the version
> +to match the ABI you expect.
> +
> +Typically the vDSO follows the naming convention of prefixing all symbols with
> +"__vdso_" or "__kernel_" so as to distinguish them from other standard symbols.
> +e.g. The "gettimeofday" function is named "__vdso_gettimeofday".
> +
> +You use the standard C calling conventions when calling any of these functions.
> +No need to worry about weird register or stack behavior.
> +.SH NOTES
> +.SS Source
> +When you compile the kernel, it will automatically compile and link the vDSO
> +code for you.
> +You will frequently find it under the architecture-specific dir:
> +
> +    find arch/$ARCH/ -name '*vdso*.so*' -o -name '*gate*.so*'
> +
> +Note that the vDSO that is used is based on the ABI of your user-space code
> +and not the ABI of the kernel.
> +i.e. If you run an i386 32-bit ELF under an i386 32-bit kernel or under an
> +x86_64 64-bit kernel, you'll get the same vDSO.
> +So when referring to sections below, use the user-space ABI.

I still can't make any sense of that last sentence. What are "sections"
in this context? What does it mean to "*use* the user-space ABI"?

> +.SS vDSO names
> +The name of this shared object varies across architectures.
> +It will often show up in things like glibc's `ldd` output.
> +The exact name should not matter to any code, so do not hardcode it.
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +user ABI	vDSO name
> +_
> +aarch64	linux-vdso.so.1
> +ia64	linux-gate.so.1
> +ppc/32	linux-vdso32.so.1
> +ppc/64	linux-vdso64.so.1
> +s390	linux-vdso32.so.1
> +s390x	linux-vdso64.so.1
> +sh	linux-gate.so.1
> +i386	linux-gate.so.1
> +x86_64	linux-vdso.so.1
> +x86/x32	linux-vdso.so.1
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS arm functions
> +.\" See linux/arch/arm/kernel/entry-armv.S
> +.\" See linux/Documentation/arm/kernel_user_helpers.txt
> +The arm port has a code page full of utility functions.
> +Since it's just a raw page of code, there is no ELF information for doing
> +symbol lookups or versioning.
> +It does provide support for different versions though.
> +
> +For documentation on this code page, it's better you refer to the kernel doc
> +as it's extremely detailed and covers everything you need to know:
> +.br
> +Documentation/arm/kernel_user_helpers.txt
> +.SS aarch64 functions
> +.\" See linux/arch/arm64/kernel/vdso/vdso.lds.S
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +symbol	version

You don't explicitly say what tables such as the below are about.
Could you provide me with a sentence to describe them?

Cheers,

Michael




> +_
> +__kernel_rt_sigreturn	LINUX_2.6.39
> +__kernel_gettimeofday	LINUX_2.6.39
> +__kernel_clock_gettime	LINUX_2.6.39
> +__kernel_clock_getres	LINUX_2.6.39
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS bfin (Blackfin) functions
> +.\" See linux/arch/blackfin/kernel/fixed_code.S
> +.\" See http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
> +As this CPU lacks a memory management unit (MMU), it doesn't set up a vDSO in
> +the normal sense.
> +Instead, it maps at boot time a few raw functions into a fixed location in
> +memory.
> +User-space applications then call directly into that region.
> +There is no provision for backwards compatibility beyond sniffing raw opcodes,
> +but as this is an embedded CPU, it can get away with things -- some of the
> +object formats it runs aren't even ELF based (they're bFLT/FLAT).
> +
> +For documentation on this code page, it's better you refer to the public docs:
> +.br
> +http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
> +.SS ia64 (Itanium) functions
> +.\" See linux/arch/ia64/kernel/gate.lds.S
> +.\" Also linux/arch/ia64/kernel/fsys.S and linux/Documentation/ia64/fsys.txt
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +symbol	version
> +_
> +__kernel_sigtramp	LINUX_2.5
> +__kernel_syscall_via_break	LINUX_2.5
> +__kernel_syscall_via_epc	LINUX_2.5
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +
> +The Itanium port actually likes to get tricky.
> +In addition to the vDSO above, it also has "light-weight system calls" (also
> +known as "fast syscalls" or "fsys").
> +You can invoke these via the __kernel_syscall_via_epc vDSO helper.
> +The system calls listed here have the same semantics as if you called them
> +directly via
> +.BR syscall (3),
> +so refer to the relevant
> +documentation for each.
> +The table below lists the functions available via this mechanism.
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l.
> +function
> +_
> +clock_gettime
> +getcpu
> +getpid
> +getppid
> +gettimeofday
> +set_tid_address
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS parisc (hppa) functions
> +.\" See linux/arch/parisc/kernel/syscall.S
> +.\" See linux/Documentation/parisc/registers
> +The parisc port has a code page full of utility functions called a gateway page.
> +Rather than use the normal ELF aux vector approach, it passes the address of
> +the page to the process via the SR2 register.
> +The permissions on the page are such that merely executing those addresses
> +automatically executes with kernel privileges and not in user-space.
> +This is done to match the way HP-UX works.
> +
> +Since it's just a raw page of code, there is no ELF information for doing
> +symbol lookups or versioning.
> +Simply call into the appropriate offset via the branch instruction, e.g.:
> +.br
> +ble <offset>(%sr2, %r0)
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +offset	function
> +_
> +00b0	lws_entry
> +00e0	set_thread_pointer
> +0100	linux_gateway_entry (syscall)
> +0268	syscall_nosys
> +0274	tracesys
> +0324	tracesys_next
> +0368	tracesys_exit
> +03a0	tracesys_sigexit
> +03b8	lws_start
> +03dc	lws_exit_nosys
> +03e0	lws_exit
> +03e4	lws_compare_and_swap64
> +03e8	lws_compare_and_swap
> +0404	cas_wouldblock
> +0410	cas_action
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS ppc/32 functions
> +.\" See linux/arch/powerpc/kernel/vdso32/vdso32.lds.S
> +The functions marked with a
> +.I *
> +below are only available when the kernel is
> +a powerpc64 (64-bit) kernel.
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +symbol	version
> +_
> +__kernel_clock_getres	LINUX_2.6.15
> +__kernel_clock_gettime	LINUX_2.6.15
> +__kernel_datapage_offset	LINUX_2.6.15
> +__kernel_get_syscall_map	LINUX_2.6.15
> +__kernel_get_tbfreq	LINUX_2.6.15
> +__kernel_getcpu \fI*\fR	LINUX_2.6.15
> +__kernel_gettimeofday	LINUX_2.6.15
> +__kernel_sigtramp_rt32	LINUX_2.6.15
> +__kernel_sigtramp32	LINUX_2.6.15
> +__kernel_sync_dicache	LINUX_2.6.15
> +__kernel_sync_dicache_p5	LINUX_2.6.15
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS ppc/64 functions
> +.\" See linux/arch/powerpc/kernel/vdso64/vdso64.lds.S
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +symbol	version
> +_
> +__kernel_clock_getres	LINUX_2.6.15
> +__kernel_clock_gettime	LINUX_2.6.15
> +__kernel_datapage_offset	LINUX_2.6.15
> +__kernel_get_syscall_map	LINUX_2.6.15
> +__kernel_get_tbfreq	LINUX_2.6.15
> +__kernel_getcpu	LINUX_2.6.15
> +__kernel_gettimeofday	LINUX_2.6.15
> +__kernel_sigtramp_rt64	LINUX_2.6.15
> +__kernel_sync_dicache	LINUX_2.6.15
> +__kernel_sync_dicache_p5	LINUX_2.6.15
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS s390 functions
> +.\" See linux/arch/s390/kernel/vdso32/vdso32.lds.S
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +symbol	version
> +_
> +__kernel_clock_getres	LINUX_2.6.29
> +__kernel_clock_gettime	LINUX_2.6.29
> +__kernel_gettimeofday	LINUX_2.6.29
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS s390x functions
> +.\" See linux/arch/s390/kernel/vdso64/vdso64.lds.S
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +symbol	version
> +_
> +__kernel_clock_getres	LINUX_2.6.29
> +__kernel_clock_gettime	LINUX_2.6.29
> +__kernel_gettimeofday	LINUX_2.6.29
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS sh (SuperH) functions
> +.\" See linux/arch/sh/kernel/vsyscall/vsyscall.lds.S
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +symbol	version
> +_
> +__kernel_rt_sigreturn	LINUX_2.6
> +__kernel_sigreturn	LINUX_2.6
> +__kernel_vsyscall	LINUX_2.6
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS i386 functions
> +.\" See linux/arch/x86/vdso/vdso32/vdso32.lds.S
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +symbol	version
> +_
> +__kernel_sigreturn	LINUX_2.5
> +__kernel_rt_sigreturn	LINUX_2.5
> +__kernel_vsyscall	LINUX_2.5
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS x86_64 functions
> +.\" See linux/arch/x86/vdso/vdso.lds.S
> +All of these symbols are also available without the "__vdso_" prefix, but
> +you should ignore those and stick to the names below.
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +symbol	version
> +_
> +__vdso_clock_gettime	LINUX_2.6
> +__vdso_getcpu	LINUX_2.6
> +__vdso_gettimeofday	LINUX_2.6
> +__vdso_time	LINUX_2.6
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS x86/x32 functions
> +.\" See linux/arch/x86/vdso/vdso32.lds.S
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +symbol	version
> +_
> +__vdso_clock_gettime	LINUX_2.6
> +__vdso_getcpu	LINUX_2.6
> +__vdso_gettimeofday	LINUX_2.6
> +__vdso_time	LINUX_2.6
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS History
> +The vDSO was originally just a single function -- the vsyscall.
> +In older kernels, you might see that in a process's memory map rather than vdso.
> +Over time, people realized that this was a great way to pass more functionality
> +to user space, so it was reconceived as a vDSO in the current format.
> +.SH SEE ALSO
> +.BR syscalls (2),
> +.BR getauxval (3),
> +.BR proc (5)
> +
> +The docs/examples/sources in the Linux sources:
> +.nf
> +Documentation/ABI/stable/vdso
> +linux/Documentation/ia64/fsys.txt
> +Documentation/vDSO/* (includes examples of using the vDSO)
> +find arch/ -iname '*vdso*' -o -iname '*gate*'
> +.fi
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux