Hi Mike, This page seems to have fallen on the floor. Would you have time to look at my comments below and submit a new version of this page? Cheers, Michael On 06/27/13 12:00, Michael Kerrisk (man-pages) wrote: > Hi Mike, > > Ping! > > Cheers, > > Michael > > > > On Wed, May 22, 2013 at 3:22 PM, Michael Kerrisk <mtk.manpages@xxxxxxxxx> wrote: >> Hi Mike, >> >> On 04/12/13 03:28, Mike Frysinger wrote: >>> here's v2 w/Andy's feedback >> >> Thanks for this--it's a nice piece of work. Could you take a >> look at my comments below and send a v3, please. >> >>> .\" Written by Mike Frysinger <vapier@xxxxxxxxxx> >>> .\" >>> .\" %%%LICENSE_START(PUBLIC_DOMAIN) >>> .\" This page is in the public domain. Suck it. >> >> Okay -- not my first choice for a license, but so be it. >> But, how about we lose the "Suck it."... >> >>> .\" %%%LICENSE_END >>> .\" >>> .TH VDSO 7 2013-04-09 "Linux" "Linux Programmer's Manual" >>> .SH NAME >>> vDSO \- overview of the virtual ELF dynamic shared object >>> .SH SYNOPSIS >>> .B #include <sys/auxv.h> >>> >>> .B void *vdso = (uintptr_t)getauxval(AT_SYSINFO_EHDR); >> >> Add space before "getauxval". (Usual convention for casts in code examples >> in man pages.) >> >>> .SH DESCRIPTION >>> The "vDSO" is a small shared library that the kernel automatically maps into the >>> address space of all userspace applications. >> >> 1,$s/userspace applications/user-space applications/ >> >>> Applications themselves usually need not concern themselves with this as it is >>> most commonly called by the C library. >> >> This last sentence doesn't quite make sense, since "this" and "it" refer to >> different things (I believe). Do you want something like: >> >> Applications generally do not need to care about the details since >> the vDSO is automatically employed by the C library >> >> ? >> >>> This way you can write using standard functions and the C library will take care >>> of using any available functionality. >>> >>> Why does this object exist at all? >> >> s/this object/the vDSO/ >> >>> There are some facilities the kernel provides that userspace ends up using >> >> s/userspace/user space/ >> >> (When used as a noun, and in other places in the page as well) >> >> >>> frequently to the point that such calls can dominate overall performance. >>> This is due both to the frequency of the call as well as the context overhead >>> from exiting userspace and entering the kernel. >>> >>> The rest of this documentation is geared towards the curious and/or C library >>> writers rather than general developers. >>> If you're trying to call the vDSO in your own application rather than using >>> the C library, you're most likely doing it wrong. >>> .SS Example Background >> >> Convention for SS headings is that only the first word is capitalized (unless >> English usage dictates otherwise--e.g., for a proper noun) >> >>> Making syscalls can be slow. >> >> 1,$s/syscall/system call/ >> >> (and other instances in the page) >> >>> In x86 32bit systems, you can trigger a software interrupt (int $0x80) to tell >> >> s/32bit/32-bit/ >> >>> the kernel you wish to make a syscall. >>> However, this instruction is expensive: it goes through the full interrupt >>> handling paths in the processor's microcode as well as in the kernel. >>> Newer processors have faster (but backwards incompatible) instructions to >>> initiate system calls. >>> Rather than require the C library to figure out if this functionality is >>> available at runtime itself, it can use functions provided by the kernel in >>> the vDSO. >> >> That last point (after the comma) is the most interesting (IMO) of the use >> cases of the vDSO. If you cared to expand on the details (i.e., are what >> are mechanics of the operation of those functions provided by the kernel), >> I think that would be interesting for the reader. >> >>> Note that the terminology can be confusing. >>> On x86 systems, the vDSO function is named "__kernel_vsyscall", but on x86_64, >>> the term "vsyscall" also refers to an obsolete way to ask the kernel what time >>> it is or what cpu the caller is on. >> >> s/cpu/CPU/ >> >>> Another frequent system call is gettimeofday(). >>> This is called both directly by userspace applications as well as indirectly by >>> the C library. >>> Think timestamps or timing loops or polling -- all of these frequently need to >>> know what time it is right now. >>> This information is also not secret -- any application in any privilege mode >>> (root or any user) will get the same answer. >>> Thus the kernel arranges for the information required to answer this question >>> to be placed in memory the process can access. >>> Now a call to gettimeofday() changes from a syscall to a normal function call >>> and a few memory accesses. >>> .SS Finding The vDSO >> >> s/The/the/ >> >>> The base address of the vDSO (if one exists) is passed by the kernel to each >>> program in the initial auxiliary vector. >>> Specifically, via the >>> .B AT_SYSINFO_EHDR >>> tag. >>> >>> You must not assume the vDSO is mapped at any particular location in the >>> user's memory map. >>> The base address will usually be randomized at runtime every time a new is >> >> Missing word after "new". >> >>> processed (at >>> .BR execve (2) >>> time). >>> This is done for security reasons to prevent standard "return-to-libc" attacks. >>> >>> For some architectures, there is also a >>> .B AT_SYSINFO >>> tag. >>> This is used only for locating the vsyscall entry point and is frequently >>> omitted or set to 0 (meaning it's not available). >>> It is a throw back to the initial vDSO work (see >> >> s/throw back/throwback/ >> >>> .IR HISTORY >>> below) and should be avoided. >>> >>> Refer to >>> .BR getauxval (3) >>> for more details on accessing these fields. >>> .SS File Format >> >> s/Format/format/ >> >>> Since the vDSO is a fully formed ELF, you can do symbol lookups on it. >> >> Missing word after ELF. >> >>> This allows new symbols to be added with newer kernel releases, and for the >>> C library to detect available functionality at runtime when running under >>> different kernel versions. >>> Often times the C library will do detection with the first call and then >>> cache the result for subsequent calls. >>> >>> All symbols are also versioned (using the GNU version format). >>> This allows the kernel (in the very unlikely situation) to update the function >> >> s/situation/case that it is necessary/ >> >>> signature without breaking backwards compatibility. >>> This means changing the arguments that it accepts as well as the return value. >> >> What is "it" in the previous line? (Please replace with a suitable noun.) >> >>> When looking up a symbol in the vDSO, you must always include the version you >>> are writing against. >>> >>> Typically the vDSO follows the naming convention of prefixing all symbols with >>> "__vdso_" or "__kernel_" so as to distinguish from other standard symbols. >> >> s/distinguish/distinguish them/ >> >>> e.g. The "gettimeofday" function is named "__vdso_gettimeofday". >>> >>> You use the standard C calling conventions when calling any of these functions. >>> No need to worry about weird register or stack behavior. >> >> That last sentence is a little incomplete. Could you expand/reword a little >> please. >> >>> .SH NOTES >>> .SS Source >>> When you compile the kernel, it will automatically compile and link the vDSO >>> code for you. >>> You will frequently find it under the arch specific dir: >> >> s/arch specific dir/architecture-specific directory/ >> >>> .br >> >> Change that last to a blank line, and then indent the next line by 4 spaces. >> >>> find arch/$ARCH/ -name '*vdso*.so*' -o -name '*gate*.so*' >>> >>> Note that the vDSO that is used is based on the ABI of your userspace code >>> and not the ABI of the kernel. >>> i.e. If you run an i386 32bit ELF under an i386 32bit kernel or under an >> >> s/i.e. If/In other words, if/ >> s/32bit/32-big/g >> >>> x86_64 64bit kernel, you'll get the same vDSO. >> >> s/64bit/64-bit/ >> >>> So when referring to sections below, use the userspace ABI. >> >> It's not clear what you mean here when you say "use the userspace ABI." >> Could you clarify? >> >>> .SS vDSO Names >> >> s/Names/names/ >> >>> The name of this shared object varies across architectures. >>> It will often show up in things like glibc's `ldd` output. >>> The exact name should not matter to any code, so please do not hardcode it. >> >> s/please// >> >>> .if t \{\ >>> .ft CW >>> \} >>> .TS >>> l l. >>> user ABI vDSO name >>> _ >>> aarch64 linux-vdso.so.1 >>> ia64 linux-gate.so.1 >>> ppc/32 linux-vdso32.so.1 >>> ppc/64 linux-vdso64.so.1 >>> s390 linux-vdso32.so.1 >>> s390x linux-vdso64.so.1 >>> sh linux-gate.so.1 >>> i386 linux-gate.so.1 >>> x86_64 linux-vdso.so.1 >>> x86/x32 linux-vdso.so.1 >>> .TE >>> .if t \{\ >>> .in >>> .ft P >>> \} >>> .SS aarch64 functions >>> .\" See linux/arch/arm64/kernel/vdso/vdso.lds.S >>> .if t \{\ >>> .ft CW >>> \} >>> .TS >>> l l. >>> symbol version >>> _ >>> __kernel_rt_sigreturn LINUX_2.6.39 >>> __kernel_gettimeofday LINUX_2.6.39 >>> __kernel_clock_gettime LINUX_2.6.39 >>> __kernel_clock_getres LINUX_2.6.39 >>> .TE >>> .if t \{\ >>> .in >>> .ft P >>> \} >>> .SS bfin (Blackfin) functions >>> .\" See linux/arch/blackfin/kernel/fixed_code.S >>> .\" See http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code >> >> Thanks -- adding references like the above in the source is helpful >> for future maintenance. >> >>> As this cpu lacks a MMU, it doesn't setup a vDSO in the normal sense. >> >> s/cpu/CPU/ >> s/MMU/memory-management unit (MMU)/ >> s/setup/set up/ >> >>> Instead, it maps at boot time a few raw functions into a fixed location in >>> memory. >>> Userspace apps then call directly into that. >> >> s/apps/applications/ >> >>> There is no provision for backwards compatibility beyond sniffing raw opcodes, >>> but as this is an embedded CPU, it can get away with things -- some of the >>> object formats it runs aren't even ELF based (they're bFLT/FLAT). >>> >>> For documentation on this format, it's better you refer to the public docs: >>> .br >>> http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code >>> .SS ia64 (Itanium) functions >>> .\" See linux/arch/ia64/kernel/gate.lds.S >>> .\" Also linux/arch/ia64/kernel/fsys.S and linux/Documentation/ia64/fsys.txt >>> .if t \{\ >>> .ft CW >>> \} >>> .TS >>> l l. >>> symbol version >>> _ >>> __kernel_sigtramp LINUX_2.5 >>> __kernel_syscall_via_break LINUX_2.5 >>> __kernel_syscall_via_epc LINUX_2.5 >>> .TE >>> .if t \{\ >>> .in >>> .ft P >>> \} >>> >>> The Itanium port actually likes to get tricky. >>> In addition to the vDSO above, it also has "light-weight system calls" aka >> >> s/aka/also known as/ >> >>> "fast syscalls" aka "fsys". >> >> s/aka/or/ >> >>> You can invoke these via the __kernel_syscall_via_epc vDSO helper. >>> The system calls listed here have the same semantics as if you called them >>> directly via >>> .BR syscall (3), >>> so refer to the relevant >>> documentation for each. >>> The table below lists the functions available via this mechanism. >>> .if t \{\ >>> .ft CW >>> \} >>> .TS >>> l. >>> function >>> _ >>> clock_gettime >>> getcpu >>> getpid >>> getppid >>> gettimeofday >>> set_tid_address >>> .TE >>> .if t \{\ >>> .in >>> .ft P >>> \} >>> .SS ppc/32 functions >>> .\" See linux/arch/powerpc/kernel/vdso32/vdso32.lds.S >>> The functions marked with a >>> .I * >>> below are only available when the kernel is >>> a powerpc64 (64bit) kernel. >> >> s/64bit/64-bit/ >> >>> .if t \{\ >>> .ft CW >>> \} >>> .TS >>> l l. >>> symbol version >>> _ >>> __kernel_clock_getres LINUX_2.6.15 >>> __kernel_clock_gettime LINUX_2.6.15 >>> __kernel_datapage_offset LINUX_2.6.15 >>> __kernel_get_syscall_map LINUX_2.6.15 >>> __kernel_get_tbfreq LINUX_2.6.15 >>> __kernel_getcpu \fI*\fR LINUX_2.6.15 >>> __kernel_gettimeofday LINUX_2.6.15 >>> __kernel_sigtramp_rt32 LINUX_2.6.15 >>> __kernel_sigtramp32 LINUX_2.6.15 >>> __kernel_sync_dicache LINUX_2.6.15 >>> __kernel_sync_dicache_p5 LINUX_2.6.15 >>> .TE >>> .if t \{\ >>> .in >>> .ft P >>> \} >>> .SS ppc/64 functions >>> .\" See linux/arch/powerpc/kernel/vdso64/vdso64.lds.S >>> .if t \{\ >>> .ft CW >>> \} >>> .TS >>> l l. >>> symbol version >>> _ >>> __kernel_clock_getres LINUX_2.6.15 >>> __kernel_clock_gettime LINUX_2.6.15 >>> __kernel_datapage_offset LINUX_2.6.15 >>> __kernel_get_syscall_map LINUX_2.6.15 >>> __kernel_get_tbfreq LINUX_2.6.15 >>> __kernel_getcpu LINUX_2.6.15 >>> __kernel_gettimeofday LINUX_2.6.15 >>> __kernel_sigtramp_rt64 LINUX_2.6.15 >>> __kernel_sync_dicache LINUX_2.6.15 >>> __kernel_sync_dicache_p5 LINUX_2.6.15 >>> .TE >>> .if t \{\ >>> .in >>> .ft P >>> \} >>> .SS s390 functions >>> .\" See linux/arch/s390/kernel/vdso32/vdso32.lds.S >>> .if t \{\ >>> .ft CW >>> \} >>> .TS >>> l l. >>> symbol version >>> _ >>> __kernel_clock_getres LINUX_2.6.29 >>> __kernel_clock_gettime LINUX_2.6.29 >>> __kernel_gettimeofday LINUX_2.6.29 >>> .TE >>> .if t \{\ >>> .in >>> .ft P >>> \} >>> .SS s390x functions >>> .\" See linux/arch/s390/kernel/vdso64/vdso64.lds.S >>> .if t \{\ >>> .ft CW >>> \} >>> .TS >>> l l. >>> symbol version >>> _ >>> __kernel_clock_getres LINUX_2.6.29 >>> __kernel_clock_gettime LINUX_2.6.29 >>> __kernel_gettimeofday LINUX_2.6.29 >>> .TE >>> .if t \{\ >>> .in >>> .ft P >>> \} >>> .SS sh (SuperH) functions >>> .\" See linux/arch/sh/kernel/vsyscall/vsyscall.lds.S >>> .if t \{\ >>> .ft CW >>> \} >>> .TS >>> l l. >>> symbol version >>> _ >>> __kernel_rt_sigreturn LINUX_2.6 >>> __kernel_sigreturn LINUX_2.6 >>> __kernel_vsyscall LINUX_2.6 >>> .TE >>> .if t \{\ >>> .in >>> .ft P >>> \} >>> .SS i386 functions >>> .\" See linux/arch/x86/vdso/vdso32/vdso32.lds.S >>> .if t \{\ >>> .ft CW >>> \} >>> .TS >>> l l. >>> symbol version >>> _ >>> __kernel_sigreturn LINUX_2.5 >>> __kernel_rt_sigreturn LINUX_2.5 >>> __kernel_vsyscall LINUX_2.5 >>> .TE >>> .if t \{\ >>> .in >>> .ft P >>> \} >>> .SS x86_64 functions >>> .\" See linux/arch/x86/vdso/vdso.lds.S >>> Each of these symbols are also available without the "__vdso_" prefix, but >> >> Either: >> s/Each of these symbols are/All of these symbols are/ >> or >> s/Each of these symbols are/Each of these symbols is/ >> >>> you should ignore those and stick to the names below. >>> .if t \{\ >>> .ft CW >>> \} >>> .TS >>> l l. >>> symbol version >>> _ >>> __vdso_clock_gettime LINUX_2.6 >>> __vdso_getcpu LINUX_2.6 >>> __vdso_gettimeofday LINUX_2.6 >>> __vdso_time LINUX_2.6 >>> .TE >>> .if t \{\ >>> .in >>> .ft P >>> \} >>> .SS x86/x32 functions >>> .\" See linux/arch/x86/vdso/vdso32.lds.S >>> .if t \{\ >>> .ft CW >>> \} >>> .TS >>> l l. >>> symbol version >>> _ >>> __vdso_clock_gettime LINUX_2.6 >>> __vdso_getcpu LINUX_2.6 >>> __vdso_gettimeofday LINUX_2.6 >>> __vdso_time LINUX_2.6 >>> .TE >>> .if t \{\ >>> .in >>> .ft P >>> \} >>> .SH HISTORY >> >> Better to have this as >> >> .SS History >> >>> The vDSO was originally just a single function -- the vsyscall. >>> In older kernels, you might see that in a process's memory map rather than vdso. >>> Overtime, people realized that this was a great way to pass more functionality >> >> s/Overtime/Over time/ >> >>> to userspace, so it was reconceived as a vDSO in the current format. >>> .SH SEE ALSO >>> .BR syscalls (2), >>> .BR getauxval (3), >>> .BR proc (5) >>> >>> The docs/examples/sources in the Linux sources: >>> .nf >>> Documentation/ABI/stable/vdso >>> linux/Documentation/ia64/fsys.txt >>> Documentation/vDSO/* (includes examples of using the vDSO) >>> find arch/ -iname '*vdso*' -o -iname '*gate*' >>> .fi >>> >> >> In the next iteration, could you include a second (separate) patch to >> syscalls.2 and getauxval.3 that adds >> .BR vdso (7) >> under SEE ALSO. >> >> Thanks, >> >> Michael > > > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html