Re: vdso(7): new man page

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Mike,

Ping!

Cheers,

Michael



On Wed, May 22, 2013 at 3:22 PM, Michael Kerrisk <mtk.manpages@xxxxxxxxx> wrote:
> Hi Mike,
>
> On 04/12/13 03:28, Mike Frysinger wrote:
>> here's v2 w/Andy's feedback
>
> Thanks for this--it's a nice piece of work. Could you take a
> look at my comments below and send a v3, please.
>
>> .\" Written by Mike Frysinger <vapier@xxxxxxxxxx>
>> .\"
>> .\" %%%LICENSE_START(PUBLIC_DOMAIN)
>> .\" This page is in the public domain.  Suck it.
>
> Okay -- not my first choice for a license, but so be it.
> But, how about we lose the "Suck it."...
>
>> .\" %%%LICENSE_END
>> .\"
>> .TH VDSO 7 2013-04-09 "Linux" "Linux Programmer's Manual"
>> .SH NAME
>> vDSO \- overview of the virtual ELF dynamic shared object
>> .SH SYNOPSIS
>> .B #include <sys/auxv.h>
>>
>> .B void *vdso = (uintptr_t)getauxval(AT_SYSINFO_EHDR);
>
> Add space before "getauxval". (Usual convention for casts in code examples
> in man pages.)
>
>> .SH DESCRIPTION
>> The "vDSO" is a small shared library that the kernel automatically maps into the
>> address space of all userspace applications.
>
> 1,$s/userspace applications/user-space applications/
>
>> Applications themselves usually need not concern themselves with this as it is
>> most commonly called by the C library.
>
> This last sentence doesn't quite make sense, since "this" and "it" refer to
> different things (I believe). Do you want something like:
>
>         Applications generally do not need to care about the details since
>         the vDSO is automatically employed by the C library
>
> ?
>
>> This way you can write using standard functions and the C library will take care
>> of using any available functionality.
>>
>> Why does this object exist at all?
>
> s/this object/the vDSO/
>
>> There are some facilities the kernel provides that userspace ends up using
>
> s/userspace/user space/
>
> (When used as a noun, and in other places in the page as well)
>
>
>> frequently to the point that such calls can dominate overall performance.
>> This is due both to the frequency of the call as well as the context overhead
>> from exiting userspace and entering the kernel.
>>
>> The rest of this documentation is geared towards the curious and/or C library
>> writers rather than general developers.
>> If you're trying to call the vDSO in your own application rather than using
>> the C library, you're most likely doing it wrong.
>> .SS Example Background
>
> Convention for SS headings is that only the first word is capitalized (unless
> English usage dictates otherwise--e.g., for a proper noun)
>
>> Making syscalls can be slow.
>
> 1,$s/syscall/system call/
>
> (and other instances in the page)
>
>> In x86 32bit systems, you can trigger a software interrupt (int $0x80) to tell
>
> s/32bit/32-bit/
>
>> the kernel you wish to make a syscall.
>> However, this instruction is expensive: it goes through the full interrupt
>> handling paths in the processor's microcode as well as in the kernel.
>> Newer processors have faster (but backwards incompatible) instructions to
>> initiate system calls.
>> Rather than require the C library to figure out if this functionality is
>> available at runtime itself, it can use functions provided by the kernel in
>> the vDSO.
>
> That last point (after the comma) is the most interesting (IMO) of the use
> cases of the vDSO. If you cared to expand on the details (i.e., are what
> are mechanics of the operation of those functions provided by the kernel),
> I think that would be interesting for the reader.
>
>> Note that the terminology can be confusing.
>> On x86 systems, the vDSO function is named "__kernel_vsyscall", but on x86_64,
>> the term "vsyscall" also refers to an obsolete way to ask the kernel what time
>> it is or what cpu the caller is on.
>
> s/cpu/CPU/
>
>> Another frequent system call is gettimeofday().
>> This is called both directly by userspace applications as well as indirectly by
>> the C library.
>> Think timestamps or timing loops or polling -- all of these frequently need to
>> know what time it is right now.
>> This information is also not secret -- any application in any privilege mode
>> (root or any user) will get the same answer.
>> Thus the kernel arranges for the information required to answer this question
>> to be placed in memory the process can access.
>> Now a call to gettimeofday() changes from a syscall to a normal function call
>> and a few memory accesses.
>> .SS Finding The vDSO
>
> s/The/the/
>
>> The base address of the vDSO (if one exists) is passed by the kernel to each
>> program in the initial auxiliary vector.
>> Specifically, via the
>> .B AT_SYSINFO_EHDR
>> tag.
>>
>> You must not assume the vDSO is mapped at any particular location in the
>> user's memory map.
>> The base address will usually be randomized at runtime every time a new is
>
> Missing word after "new".
>
>> processed (at
>> .BR execve (2)
>> time).
>> This is done for security reasons to prevent standard "return-to-libc" attacks.
>>
>> For some architectures, there is also a
>> .B AT_SYSINFO
>> tag.
>> This is used only for locating the vsyscall entry point and is frequently
>> omitted or set to 0 (meaning it's not available).
>> It is a throw back to the initial vDSO work (see
>
> s/throw back/throwback/
>
>> .IR HISTORY
>> below) and should be avoided.
>>
>> Refer to
>> .BR getauxval (3)
>> for more details on accessing these fields.
>> .SS File Format
>
> s/Format/format/
>
>> Since the vDSO is a fully formed ELF, you can do symbol lookups on it.
>
> Missing word after ELF.
>
>> This allows new symbols to be added with newer kernel releases, and for the
>> C library to detect available functionality at runtime when running under
>> different kernel versions.
>> Often times the C library will do detection with the first call and then
>> cache the result for subsequent calls.
>>
>> All symbols are also versioned (using the GNU version format).
>> This allows the kernel (in the very unlikely situation) to update the function
>
> s/situation/case that it is necessary/
>
>> signature without breaking backwards compatibility.
>> This means changing the arguments that it accepts as well as the return value.
>
> What is "it" in the previous line? (Please replace with a suitable noun.)
>
>> When looking up a symbol in the vDSO, you must always include the version you
>> are writing against.
>>
>> Typically the vDSO follows the naming convention of prefixing all symbols with
>> "__vdso_" or "__kernel_" so as to distinguish from other standard symbols.
>
> s/distinguish/distinguish them/
>
>> e.g. The "gettimeofday" function is named "__vdso_gettimeofday".
>>
>> You use the standard C calling conventions when calling any of these functions.
>> No need to worry about weird register or stack behavior.
>
> That last sentence is a little incomplete. Could you expand/reword a little
> please.
>
>> .SH NOTES
>> .SS Source
>> When you compile the kernel, it will automatically compile and link the vDSO
>> code for you.
>> You will frequently find it under the arch specific dir:
>
> s/arch specific dir/architecture-specific directory/
>
>> .br
>
> Change that last to a blank line, and then indent the next line by 4 spaces.
>
>> find arch/$ARCH/ -name '*vdso*.so*' -o -name '*gate*.so*'
>>
>> Note that the vDSO that is used is based on the ABI of your userspace code
>> and not the ABI of the kernel.
>> i.e. If you run an i386 32bit ELF under an i386 32bit kernel or under an
>
> s/i.e. If/In other words, if/
> s/32bit/32-big/g
>
>> x86_64 64bit kernel, you'll get the same vDSO.
>
> s/64bit/64-bit/
>
>> So when referring to sections below, use the userspace ABI.
>
> It's not clear what you mean here when you say "use the userspace ABI."
> Could you clarify?
>
>> .SS vDSO Names
>
> s/Names/names/
>
>> The name of this shared object varies across architectures.
>> It will often show up in things like glibc's `ldd` output.
>> The exact name should not matter to any code, so please do not hardcode it.
>
> s/please//
>
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> user ABI      vDSO name
>> _
>> aarch64       linux-vdso.so.1
>> ia64  linux-gate.so.1
>> ppc/32        linux-vdso32.so.1
>> ppc/64        linux-vdso64.so.1
>> s390  linux-vdso32.so.1
>> s390x linux-vdso64.so.1
>> sh    linux-gate.so.1
>> i386  linux-gate.so.1
>> x86_64        linux-vdso.so.1
>> x86/x32       linux-vdso.so.1
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SS aarch64 functions
>> .\" See linux/arch/arm64/kernel/vdso/vdso.lds.S
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> symbol        version
>> _
>> __kernel_rt_sigreturn LINUX_2.6.39
>> __kernel_gettimeofday LINUX_2.6.39
>> __kernel_clock_gettime        LINUX_2.6.39
>> __kernel_clock_getres LINUX_2.6.39
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SS bfin (Blackfin) functions
>> .\" See linux/arch/blackfin/kernel/fixed_code.S
>> .\" See http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
>
> Thanks -- adding references like the above in the source is helpful
> for future maintenance.
>
>> As this cpu lacks a MMU, it doesn't setup a vDSO in the normal sense.
>
> s/cpu/CPU/
> s/MMU/memory-management unit (MMU)/
> s/setup/set up/
>
>> Instead, it maps at boot time a few raw functions into a fixed location in
>> memory.
>> Userspace apps then call directly into that.
>
> s/apps/applications/
>
>> There is no provision for backwards compatibility beyond sniffing raw opcodes,
>> but as this is an embedded CPU, it can get away with things -- some of the
>> object formats it runs aren't even ELF based (they're bFLT/FLAT).
>>
>> For documentation on this format, it's better you refer to the public docs:
>> .br
>> http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
>> .SS ia64 (Itanium) functions
>> .\" See linux/arch/ia64/kernel/gate.lds.S
>> .\" Also linux/arch/ia64/kernel/fsys.S and linux/Documentation/ia64/fsys.txt
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> symbol        version
>> _
>> __kernel_sigtramp     LINUX_2.5
>> __kernel_syscall_via_break    LINUX_2.5
>> __kernel_syscall_via_epc      LINUX_2.5
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>>
>> The Itanium port actually likes to get tricky.
>> In addition to the vDSO above, it also has "light-weight system calls" aka
>
> s/aka/also known as/
>
>> "fast syscalls" aka "fsys".
>
> s/aka/or/
>
>> You can invoke these via the __kernel_syscall_via_epc vDSO helper.
>> The system calls listed here have the same semantics as if you called them
>> directly via
>> .BR syscall (3),
>> so refer to the relevant
>> documentation for each.
>> The table below lists the functions available via this mechanism.
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l.
>> function
>> _
>> clock_gettime
>> getcpu
>> getpid
>> getppid
>> gettimeofday
>> set_tid_address
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SS ppc/32 functions
>> .\" See linux/arch/powerpc/kernel/vdso32/vdso32.lds.S
>> The functions marked with a
>> .I *
>> below are only available when the kernel is
>> a powerpc64 (64bit) kernel.
>
> s/64bit/64-bit/
>
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> symbol        version
>> _
>> __kernel_clock_getres LINUX_2.6.15
>> __kernel_clock_gettime        LINUX_2.6.15
>> __kernel_datapage_offset      LINUX_2.6.15
>> __kernel_get_syscall_map      LINUX_2.6.15
>> __kernel_get_tbfreq   LINUX_2.6.15
>> __kernel_getcpu \fI*\fR       LINUX_2.6.15
>> __kernel_gettimeofday LINUX_2.6.15
>> __kernel_sigtramp_rt32        LINUX_2.6.15
>> __kernel_sigtramp32   LINUX_2.6.15
>> __kernel_sync_dicache LINUX_2.6.15
>> __kernel_sync_dicache_p5      LINUX_2.6.15
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SS ppc/64 functions
>> .\" See linux/arch/powerpc/kernel/vdso64/vdso64.lds.S
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> symbol        version
>> _
>> __kernel_clock_getres LINUX_2.6.15
>> __kernel_clock_gettime        LINUX_2.6.15
>> __kernel_datapage_offset      LINUX_2.6.15
>> __kernel_get_syscall_map      LINUX_2.6.15
>> __kernel_get_tbfreq   LINUX_2.6.15
>> __kernel_getcpu       LINUX_2.6.15
>> __kernel_gettimeofday LINUX_2.6.15
>> __kernel_sigtramp_rt64        LINUX_2.6.15
>> __kernel_sync_dicache LINUX_2.6.15
>> __kernel_sync_dicache_p5      LINUX_2.6.15
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SS s390 functions
>> .\" See linux/arch/s390/kernel/vdso32/vdso32.lds.S
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> symbol        version
>> _
>> __kernel_clock_getres LINUX_2.6.29
>> __kernel_clock_gettime        LINUX_2.6.29
>> __kernel_gettimeofday LINUX_2.6.29
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SS s390x functions
>> .\" See linux/arch/s390/kernel/vdso64/vdso64.lds.S
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> symbol        version
>> _
>> __kernel_clock_getres LINUX_2.6.29
>> __kernel_clock_gettime        LINUX_2.6.29
>> __kernel_gettimeofday LINUX_2.6.29
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SS sh (SuperH) functions
>> .\" See linux/arch/sh/kernel/vsyscall/vsyscall.lds.S
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> symbol        version
>> _
>> __kernel_rt_sigreturn LINUX_2.6
>> __kernel_sigreturn    LINUX_2.6
>> __kernel_vsyscall     LINUX_2.6
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SS i386 functions
>> .\" See linux/arch/x86/vdso/vdso32/vdso32.lds.S
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> symbol        version
>> _
>> __kernel_sigreturn    LINUX_2.5
>> __kernel_rt_sigreturn LINUX_2.5
>> __kernel_vsyscall     LINUX_2.5
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SS x86_64 functions
>> .\" See linux/arch/x86/vdso/vdso.lds.S
>> Each of these symbols are also available without the "__vdso_" prefix, but
>
> Either:
> s/Each of these symbols are/All of these symbols are/
> or
> s/Each of these symbols are/Each of these symbols is/
>
>> you should ignore those and stick to the names below.
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> symbol        version
>> _
>> __vdso_clock_gettime  LINUX_2.6
>> __vdso_getcpu LINUX_2.6
>> __vdso_gettimeofday   LINUX_2.6
>> __vdso_time   LINUX_2.6
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SS x86/x32 functions
>> .\" See linux/arch/x86/vdso/vdso32.lds.S
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> symbol        version
>> _
>> __vdso_clock_gettime  LINUX_2.6
>> __vdso_getcpu LINUX_2.6
>> __vdso_gettimeofday   LINUX_2.6
>> __vdso_time   LINUX_2.6
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SH HISTORY
>
> Better to have this as
>
> .SS History
>
>> The vDSO was originally just a single function -- the vsyscall.
>> In older kernels, you might see that in a process's memory map rather than vdso.
>> Overtime, people realized that this was a great way to pass more functionality
>
> s/Overtime/Over time/
>
>> to userspace, so it was reconceived as a vDSO in the current format.
>> .SH SEE ALSO
>> .BR syscalls (2),
>> .BR getauxval (3),
>> .BR proc (5)
>>
>> The docs/examples/sources in the Linux sources:
>> .nf
>> Documentation/ABI/stable/vdso
>> linux/Documentation/ia64/fsys.txt
>> Documentation/vDSO/* (includes examples of using the vDSO)
>> find arch/ -iname '*vdso*' -o -iname '*gate*'
>> .fi
>>
>
> In the next iteration, could you include a second (separate) patch to
> syscalls.2  and getauxval.3 that adds
> .BR vdso (7)
> under SEE ALSO.
>
> Thanks,
>
> Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux