On Thu, 3 Nov 2022 at 15:54, Muhammad Usama Anjum <usama.anjum@xxxxxxxxxxxxx> wrote: > This IOCTL, PAGEMAP_SCAN can be used to get and/or clear the info about > page table entries. The following operations are supported in this ioctl: > - Get the information if the pages are soft-dirty, file mapped, present > or swapped. > - Clear the soft-dirty PTE bit of the pages. > - Get and clear the soft-dirty PTE bit of the pages. > > Only the soft-dirty bit can be read and cleared atomically. struct > pagemap_sd_args is used as the argument of the IOCTL. In this struct: > - The range is specified through start and len. > - The output buffer and size is specified as vec and vec_len. > - The optional maximum requested pages are specified in the max_pages. > - The flags can be specified in the flags field. The PAGEMAP_SD_CLEAR > and PAGEMAP_SD_NO_REUSED_REGIONS are supported. > - The masks are specified in rmask, amask, emask and return_mask. [...] > --- a/include/uapi/linux/fs.h > +++ b/include/uapi/linux/fs.h > @@ -305,4 +305,57 @@ typedef int __bitwise __kernel_rwf_t; > #define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\ > RWF_APPEND) > > +/* PAGEMAP IOCTL */ > +#define PAGEMAP_SCAN _IOWR('f', 16, struct pagemap_scan_arg) > + > +/* Bits are set in the bitmap of the page_region and masks in pagemap_sd_args */ > +#define PAGE_IS_SD (1 << 0) Can we name it PAGE_IS_SOFTDIRTY? "SD" can mean so many things. > +#define PAGE_IS_FILE (1 << 1) > +#define PAGE_IS_PRESENT (1 << 2) > +#define PAGE_IS_SWAPED (1 << 3) PAGE_IS_SWAPPED? > + > +/* > + * struct page_region - Page region with bitmap flags > + * @start: Start of the region > + * @len: Length of the region > + * bitmap: Bits sets for the region > + */ > +struct page_region { > + __u64 start; > + __u64 len; > + __u32 bitmap; > + __u32 __reserved; "u64 flags"? If an extension is needed it would already require a new ioctl or something in the `arg` struct. > + > +/* > + * struct pagemap_scan_arg - Soft-dirty IOCTL argument Since this is no longer a soft-dirty-specific call, it might be better to describe it as "VM scan ioctl" or similar. BTW, the implementation is currently guarded by CONFIG_MEM_SOFT_DIRTY, but CRIU doesn't need that but needs the other bits handling. > + * @start: Starting address of the region > + * @len: Length of the region (All the pages in this length are included) > + * @vec: Address of page_region struct array for output > + * @vec_len: Length of the page_region struct array > + * @max_pages: Optional max return pages (It must be less than vec_len if specified) I think we discussed that this is not counting the same things as vec_len, so there should not be a reference between the two. The limit is whatever fits under both conditions (IOW: n_vecs <= vec_len && (!max_pages || n_pages <= max_pages). > + * @flags: Special flags for the IOCTL Just "Flags for the IOCTL". > + * @rmask: Required mask - All of these bits have to be set in the PTE > + * @amask: Any mask - Any of these bits are set in the PTE > + * @emask: Exclude mask - None of these bits are set in the PTE It might be easier for developers if those were named e.g. "required_mask", "anyof_mask", "excluded_mask". > + * @return_mask: Bits that have to be reported to the user in page_region "Bits that are to be reported in page_region"? > + */ > +struct pagemap_scan_arg { > + __u64 start; > + __u64 len; > + __u64 vec; > + __u64 vec_len; > + __u32 max_pages; > + __u32 flags; > + __u32 rmask; > + __u32 amask; > + __u32 emask; > + __u32 return_mask; > +}; > + > +/* Special flags */ > +#define PAGEMAP_SD_CLEAR (1 << 0) SD -> SOFTDIRTY > +/* Check the individual pages if they are soft-dirty to find dirty pages faster. */ > +#define PAGEMAP_NO_REUSED_REGIONS (1 << 1) Please include the description from commitmsg of what this flag does (i.e. how the behaviour differs because of the flag). I'd drop the part about it being faster, as if so - why have the flag at all instead of just always using the faster way? (I only reviewed the API now. The implementation I think could be simpler, but let's leave that to after the API is agreed on.) Best Regards Michał Mirosław