Draft of the move_phys_pages syscall proposed in RFC: https://lore.kernel.org/all/20230907075453.350554-1-gregory.price@xxxxxxxxxxxx/ Signed-off-by: Gregory Price <gregory.price@xxxxxxxxxxxx> --- man2/move_phys_pages.2 | 180 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 180 insertions(+) create mode 100644 man2/move_phys_pages.2 diff --git a/man2/move_phys_pages.2 b/man2/move_phys_pages.2 new file mode 100644 index 000000000..4f4b68915 --- /dev/null +++ b/man2/move_phys_pages.2 @@ -0,0 +1,180 @@ +.\" SPDX-License-Identifier: Linux-man-pages-copyleft-2-para +.\" +.\" This manpage is Copyright (C) 2006 Silicon Graphics, Inc. +.\" Christoph Lameter +.\" This manpage is Copyright (C) 2023 MemVerge, Inc. +.\" Gregory Price +.\" +.\" +.TH move_phys_pages 2 (date) "Linux man-pages (unreleased)" +.SH NAME +move_phys_pages \- move individual physically-addressed pages to another node +.SH LIBRARY +NUMA (Non-Uniform Memory Access) policy library +.RI ( libnuma ", " \-lnuma ) +.SH SYNOPSIS +.nf +.B #include <numaif.h> +.PP +.BI "long move_phys_pages(unsigned long " count ", \ +uint64_t *" pages [. count ], +.BI " const int " nodes [. count "], int " status [. count "], \ +int " flags ); +.fi +.SH DESCRIPTION +.BR move_phys_pages () +moves the specified +.I physical pages +to the memory nodes specified by +.IR nodes . +The result of the move is reflected in +.IR status . +The +.I flags +indicate constraints on the pages to be moved. +.PP +This interface requires +.RB ( CAP_SYS_ADMIN ) . +.PP +.I count +is the number of pages to move. +It defines the size of the three arrays +.IR pages , +.IR nodes , +and +.IR status . +.PP +.I pages +is an array of physical addresses to the pages that should be moved. +These are addresses that should be aligned to page boundaries. +.PP +.I nodes +is an array of integers that specify the desired location for each page. +Each element in the array is a node number. +.I nodes +can also be NULL, in which case +.BR move_phys_pages () +does not move any pages but instead will return the node +where each page currently resides, in the +.I status +array. +Obtaining the status of each page may be necessary to determine +pages that need to be moved. +.PP +.I status +is an array of integers that return the status of each page. +The array contains valid values only if +.BR move_phys_pages () +did not return an error. +Preinitialization of the array to a value +which cannot represent a real numa node or valid error of status array +could help to identify pages that have been migrated if a partial +failure occurs. +.PP +.I flags +specify what types of pages to move. +.B MPOL_MF_MOVE +means that only pages that are in exclusive use by a process +are to be moved. +.B MPOL_MF_MOVE_ALL +means that pages shared between multiple processes can also be moved. +.SS Page states in the status array +The following values can be returned in each element of the +.I status +array. +.TP +.B 0..MAX_NUMNODES +Identifies the node on which the page resides. +.TP +.B \-EACCES +The target node for the page is not in the insectional set of allowed +nodes defined by all tasks mapping the address. At least one task +mapping the address does not allow memory the target node. +.TP +.B \-EBUSY +The page is currently busy and cannot be moved. +Try again later. +This occurs if a page is undergoing I/O or another kernel subsystem +is holding a reference to the page. +.TP +.B \-EFAULT +This is a zero page, the memory area is not mapped by the process, +or the memory is not migratable. +.TP +.B \-EIO +Unable to write back a page. +The page has to be written back +in order to move it since the page is dirty and the filesystem +does not provide a migration function that would allow the move +of dirty pages. +.TP +.B \-EINVAL +A dirty page cannot be moved. +The filesystem does not +provide a migration function and has no ability to write back pages. + +.TP +.B \-ENOENT +The physical page is not online or the page is not present in any VMA. +.TP +.B \-ENOMEM +Unable to allocate memory on target node. +.SH RETURN VALUE +On success +.BR move_phys_pages () +returns zero. +.\" FIXME . Is the following quite true: does the wrapper in numactl +.\" do the right thing? +On error, it returns \-1, and sets +.I errno +to indicate the error. +If positive value is returned, it is the number of +nonmigrated pages. +.SH ERRORS +.TP +.B Positive value +The number of nonmigrated pages if they were the result of nonfatal +reasons. +.TP +.B EFAULT +Parameter array could not be accessed. +.TP +.B EINVAL +The flag value was not 0 (Linux 6.6), or an attempt was made to +migrate pages of a kernel thread. +.TP +.B ENODEV +One of the target nodes is not online. +.TP +.B EPERM +The caller specified has insufficient privileges +.RB ( CAP_SYS_ADMIN ). +.SH STANDARDS +Linux. +.SH HISTORY +Linux X.Y.Z +.SH NOTES +For information on library support, see +.BR numa (7). +.PP +Use of this function may result in pages whose location +(node) violates the memory policy established for the +specified addresses (See +.BR mbind (2)) +and/or the specified process (See +.BR set_mempolicy (2)). +That is, memory policy does not constrain the destination +nodes used by +.BR move_phys_pages (). +.PP +The +.I <numaif.h> +header is not included with glibc, but requires installing +.I libnuma\-devel +or a similar package. +.SH SEE ALSO +.BR mbind (2), +.BR numa (3), +.BR numa_maps (5), +.BR cpuset (7), +.BR numa (7), +.BR migratepages (8), +.BR numastat (8) -- 2.34.1