On Fri, Dec 11, 2020 at 09:27:46PM +0100, Jann Horn wrote: > +CC Christoph Hellwig for opinions on compat > > On Thu, Nov 26, 2020 at 12:22 AM Minchan Kim <minchan@xxxxxxxxxx> wrote: > > On Mon, Nov 23, 2020 at 09:39:42PM -0800, Suren Baghdasaryan wrote: > > > process_madvise requires a vector of address ranges to be provided for > > > its operations. When an advice should be applied to the entire process, > > > the caller process has to obtain the list of VMAs of the target process > > > by reading the /proc/pid/maps or some other way. The cost of this > > > operation grows linearly with increasing number of VMAs in the target > > > process. Even constructing the input vector can be non-trivial when > > > target process has several thousands of VMAs and the syscall is being > > > issued during high memory pressure period when new allocations for such > > > a vector would only worsen the situation. > > > In the case when advice is being applied to the entire memory space of > > > the target process, this creates an extra overhead. > > > Add PMADV_FLAG_RANGE flag for process_madvise enabling the caller to > > > advise a memory range of the target process. For now, to keep it simple, > > > only the entire process memory range is supported, vec and vlen inputs > > > in this mode are ignored and can be NULL and 0. > > > Instead of returning the number of bytes that advice was successfully > > > applied to, the syscall in this mode returns 0 on success. This is due > > > to the fact that the number of bytes would not be useful for the caller > > > that does not know the amount of memory the call is supposed to affect. > > > Besides, the ssize_t return type can be too small to hold the number of > > > bytes affected when the operation is applied to a large memory range. > > > > Can we just use one element in iovec to indicate entire address rather > > than using up the reserved flags? > > > > struct iovec { > > .iov_base = NULL, > > .iov_len = (~(size_t)0), > > }; > > In addition to Suren's objections, I think it's also worth considering > how this looks in terms of compat API. If a compat process does > process_madvise() on another compat process, it would be specifying > the maximum 32-bit number, rather than the maximum 64-bit number, so > you'd need special code to catch that case, which would be ugly. > > And when a compat process uses this API on a non-compat process, it > semantically gets really weird: The actual address range covered would > be larger than the address range specified. > > And if we want different access checks for the two flavors in the > future, gating that different behavior on special values in the iovec > would feel too magical to me. > > And the length value SIZE_MAX doesn't really make sense anyway because > the length of the whole address space would be SIZE_MAX+1, which you > can't express. > > So I'm in favor of a new flag, and strongly against using SIZE_MAX as > a magic number here. Can't we simply pass NULL as iovec as special id, then?