On Thu, Feb 6, 2025 at 10:28 AM Thomas Weißschuh <thomas.weissschuh@xxxxxxxxxxxxx> wrote: > > On Thu, Feb 06, 2025 at 09:38:59AM -0500, enh wrote: > > On Thu, Feb 6, 2025 at 8:20 AM Thomas Weißschuh > > <thomas.weissschuh@xxxxxxxxxxxxx> wrote: > > > > > > On Fri, Jan 17, 2025 at 02:35:18PM -0500, enh wrote: > > > > On Fri, Jan 17, 2025 at 1:20 PM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote: > > > > > > <snip> > > > > > > > > There are technical difficulties to seal vdso/vvar from the glibc > > > > > side. The dynamic linker lacks vdso/vvar mapping size information, and > > > > > architectural variations for vdso/vvar also means sealing from the > > > > > kernel side is a simpler solution. Adhemerval has more details in case > > > > > clarification is needed from the glibc side. > > > > > > > > as a maintainer of a different linux libc, i've long wanted a "tell me > > > > everything there is to know about this vma" syscall rather than having > > > > to parse /proc/maps... > > > > > > > > ...but in this special case, is the vdso/vvar size ever anything other > > > > than "one page" in practice? > > > > > > x86 has two additional vvar pages for virtual clocks. > > > (Since v6.13 even split into their own mapping) > > > Loongarch has per-cpu vvar data which is larger than one page. > > > The vdso mapping is however many pages the code ends up being compiled as, > > > for example on my current x86_64 distro kernel it's two pages. > > > In the near future, probably v6.14, vvars will be split over multiple > > > pages in general [0]. > > > > /me checks the nearest arm64 phone ... yeah, vdso is still only one > > page there but vvars is already more than one. > > Probably due to CONFIG_TIME_NS, see below. > > > is there a TL;DR (or RTFM link) for why this is so big? a quick look > > at the x86 suggests there should only be 640 bytes of various things > > plus a handful of bytes for the rng, and while arm64 looks very > > different, that looks like it's explicitly asking for a page (with the > > vdso_data_store stuff)? (i've never had any reason to look at vvars > > before, only vdso.) > > I don't think there is any real manual. > > The vvar data is *shared* between the kernel and userspace. > This is done by mapping the *same* physical memory into the kernel > ("vdso_data_store") and (read-only) into all userspace processes. > As PTEs always cover a full page and the kernel can not expose random > other internal kernel data into userspace, the vvars need to be in their > own dedicated page. > (The same is true for the vDSO code, uprobe trampoline, etc... mappings) > > The vDSO functions also need to be aware of time namespaces. This is > implemented by allocating one page per namespace and mapping this > in place of the regular vvar page. But the vDSO still needs to access > the regular vvar page for some information, so both are mapped. ah, i see. yeah, that makes sense. (amusingly, i almost quipped "it's not like there are _that_ many clocks to go in there" in my previous mail, forgetting that there are effectively an unbounded number of clocks thanks to this feature!) > Then on top come the rng state and some architecture-specific data. > These are currently part of the time page. So they also have to dance > around the time namespace mapping shenanigans. In addition they have to > coexist with the actual time data, which is currently done by manually > calculating byte offsets for them in the time page and hardcoding those. > > The linked series cleans this up by moving things into dedicated pages. > To make the code easier to understand and to make it possible to > add new data to the time page without running out of space or > introducing conflicts which need to be detected manually. > While this needs to allocate more pages, these are shared between the > whole system, so effectively it's cheap. It also requires more virtual > memory space in each process, but that shouldn't matter. > > > As for arm64 looking very different from x86: Hopefully not for long :-) (even as someone who doesn't work on the kernel, things like this are always helpful --- just having one thing to understand/your first grep being relevant is much nicer than "oh, wait ... which architecture was that?".) > > > Figuring out the start and size from /proc/maps, or the new > > > PROCMAP_QUERY ioctl, is not trivial, due to architectural variations. > > > > (obviously it's unsatisfying as a general interface, but in practice > > the VMAs i see asked about about directly -- rather than just rounded > > up in a diagnostic dump -- are either stacks ["what are the bounds of > > this stack, and does it have guard pages already?"] or code ["what > > file was the code at this pc mapped in from?"]. so while the vdso > > would come up, we'd never notice if vvars didn't work. if your sp/pc > > point there, we were already just going to bail anyway :-) ) > > Fair enough. > > This information was also a response to Jeff's parent mail, > as it would be relevant when sealing the mappings from ld.so. > > <snip> > > > > [0] https://lore.kernel.org/lkml/20250204-vdso-store-rng-v3-0-13a4669dfc8c@xxxxxxxxxxxxx/