Thanks Amit for the inputs!! On 3/10/2022 12:20 AM, Nadav Amit wrote: > --- > mm/madvise.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/mm/madvise.c b/mm/madvise.c > index 38d0f51..d3b49b3 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -1426,15 +1426,21 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, > > while (iov_iter_count(&iter)) { > iovec = iov_iter_iovec(&iter); > + /* > + * Even when [start, end) passed to do_madvise covers > + * some unmapped addresses, it continues processing with > + * returning ENOMEM at the end. Thus consider the range > + * as processed when do_madvise() returns ENOMEM. > + * This makes process_madvise() never returns ENOMEM. > + */ > > I fully understand and relate to the basic motivation of this > patch. > > The ENOMEM that this patch checks for, IIUC, is the ENOMEM that is > returned on unmapped holes. Such ENOMEM does not appear, according to > the man page, to be a valid reason to return ENOMEM to userspace. > Presumably process_madvise() is expected to skip unmapped holes > and not to fail because of them> True, that ENOMEM represents the VMA passed contains the unmapped holes. Pasting the Documentation of do_madvise(): * -ENOMEM - addresses in the specified range are not currently * mapped, or are outside the AS of the process. Internally process_madvise() calls do_madvise() in a loop by passing the vma it received in 'struct iovec'. And I too agree here that process_madvise() is expected to process the unmapped holes. > Having said that, I do not think that the check that the patch does > is clean or clearly documented. If it is about the Documentation, how about adding: "Since process_madvise() is expected to process unmapped holes, never return ENOMEM received from do_madvise() to user". If the code changes can be made further cleaner, please suggest. > > In addition, this patch (and some work on process_madvise()) raise > in my mind a couple of questions: > > 1. There are other errors that process_madvise might encounter > and can be propagated back to userspace, but are not > documented. For instance if can_madv_lru_vma() fails on > MADV_COLD, userspace will get EINVAL. EINVAL is not documented > as a valid error-code for such case in either madvise() and > process_madvise() man pages. I agree here with the man page documentations too and felt the same while going through them. For the mentioned case too, in the madvise[1] man page, EINVAL return type is only talked for MADV_DONTNEED and MADV_REMOVE. It should also contains for MADV_PAGEOUT, MADV_COLD and as well for MADV_FREE. The other missing return types, which I came across, in process_madvise are: EINVAL - return from process_madvise_behavior_valid(). EINTR - from mm_access() EACCES - from mm_access() Thanks, Charan