On Mon, 27 Nov 2017 09:19:39 +0200 Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx> wrote: > From: Andrei Vagin <avagin@xxxxxxxxxxxxx> > > It is a hybrid of process_vm_readv() and vmsplice(). > > vmsplice can map memory from a current address space into a pipe. > process_vm_readv can read memory of another process. > > A new system call can map memory of another process into a pipe. > > ssize_t process_vmsplice(pid_t pid, int fd, const struct iovec *iov, > unsigned long nr_segs, unsigned int flags) > > All arguments are identical with vmsplice except pid which specifies a > target process. > > Currently if we want to dump a process memory to a file or to a socket, > we can use process_vm_readv() + write(), but it works slow, because data > are copied into a temporary user-space buffer. > > A second way is to use vmsplice() + splice(). It is more effective, > because data are not copied into a temporary buffer, but here is another > problem. vmsplice works with the currect address space, so it can be > used only if we inject our code into a target process. > > The second way suffers from a few other issues: > * a process has to be stopped to run a parasite code > * a number of pipes is limited, so it may be impossible to dump all > memory in one iteration, and we have to stop process and inject our > code a few times. > * pages in pipes are unreclaimable, so it isn't good to hold a lot of > memory in pipes. > > The introduced syscall allows to use a second way without injecting any > code into a target process. > > My experiments shows that process_vmsplice() + splice() works two time > faster than process_vm_readv() + write(). > > It is particularly useful on a pre-dump stage. On this stage we enable a > memory tracker, and then we are dumping a process memory while a > process continues work. On the first iteration we are dumping all > memory, and then we are dumpung only modified memory from a previous > iteration. After a few pre-dump operations, a process is stopped and > dumped finally. The pre-dump operations allow to significantly decrease > a process downtime, when a process is migrated to another host. What is the overall improvement in a typical dumping operation? Does that improvement justify the addition of a new syscall, and all that this entails? If so, why? Are there any other applications of this syscall?