Re: [PATCH v3 3/9] fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 04, 2024 at 05:24:48PM -0700, Andrii Nakryiko wrote:
> /proc/<pid>/maps file is extremely useful in practice for various tasks
> involving figuring out process memory layout, what files are backing any
> given memory range, etc. One important class of applications that
> absolutely rely on this are profilers/stack symbolizers (perf tool being one
> of them). Patterns of use differ, but they generally would fall into two
> categories.
> 
> In on-demand pattern, a profiler/symbolizer would normally capture stack
> trace containing absolute memory addresses of some functions, and would
> then use /proc/<pid>/maps file to find corresponding backing ELF files
> (normally, only executable VMAs are of interest), file offsets within
> them, and then continue from there to get yet more information (ELF
> symbols, DWARF information) to get human-readable symbolic information.
> This pattern is used by Meta's fleet-wide profiler, as one example.
> 
> In preprocessing pattern, application doesn't know the set of addresses
> of interest, so it has to fetch all relevant VMAs (again, probably only
> executable ones), store or cache them, then proceed with profiling and
> stack trace capture. Once done, it would do symbolization based on
> stored VMA information. This can happen at much later point in time.
> This patterns is used by perf tool, as an example.
> 
> In either case, there are both performance and correctness requirement
> involved. This address to VMA information translation has to be done as
> efficiently as possible, but also not miss any VMA (especially in the
> case of loading/unloading shared libraries). In practice, correctness
> can't be guaranteed (due to process dying before VMA data can be
> captured, or shared library being unloaded, etc), but any effort to
> maximize the chance of finding the VMA is appreciated.
> 
> Unfortunately, for all the /proc/<pid>/maps file universality and
> usefulness, it doesn't fit the above use cases 100%.
> 
> First, it's main purpose is to emit all VMAs sequentially, but in
> practice captured addresses would fall only into a smaller subset of all
> process' VMAs, mainly containing executable text. Yet, library would
> need to parse most or all of the contents to find needed VMAs, as there
> is no way to skip VMAs that are of no use. Efficient library can do the
> linear pass and it is still relatively efficient, but it's definitely an
> overhead that can be avoided, if there was a way to do more targeted
> querying of the relevant VMA information.
> 
> Second, it's a text based interface, which makes its programmatic use from
> applications and libraries more cumbersome and inefficient due to the
> need to handle text parsing to get necessary pieces of information. The
> overhead is actually payed both by kernel, formatting originally binary
> VMA data into text, and then by user space application, parsing it back
> into binary data for further use.

I was trying to solve all these issues in a more generic way:
https://lwn.net/Articles/683371/

We definitely interested in this new interface to use it in CRIU.

<snip>

> +
> +	if (karg.vma_name_size) {
> +		size_t name_buf_sz = min_t(size_t, PATH_MAX, karg.vma_name_size);
> +		const struct path *path;
> +		const char *name_fmt;
> +		size_t name_sz = 0;
> +
> +		get_vma_name(vma, &path, &name, &name_fmt);
> +
> +		if (path || name_fmt || name) {
> +			name_buf = kmalloc(name_buf_sz, GFP_KERNEL);
> +			if (!name_buf) {
> +				err = -ENOMEM;
> +				goto out;
> +			}
> +		}
> +		if (path) {
> +			name = d_path(path, name_buf, name_buf_sz);
> +			if (IS_ERR(name)) {
> +				err = PTR_ERR(name);
> +				goto out;

It always fails if a file path name is longer than PATH_MAX.

Can we add a flag to indicate whether file names are needed to be
resolved? In criu, we use special names like "vvar", "vdso", but we dump
files via /proc/pid/map_files.

> +			}
> +			name_sz = name_buf + name_buf_sz - name;
> +		} else if (name || name_fmt) {
> +			name_sz = 1 + snprintf(name_buf, name_buf_sz, name_fmt ?: "%s", name);
> +			name = name_buf;
> +		}
> +		if (name_sz > name_buf_sz) {
> +			err = -ENAMETOOLONG;
> +			goto out;
> +		}
> +		karg.vma_name_size = name_sz;
> +	}

Thanks,
Andrei




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux