On Fri, Mar 18, 2022 at 07:35:41PM +0530, Charan Teja Kalla wrote: > Thank you for valuable inputs. > > On 3/18/2022 2:08 AM, Nadav Amit wrote: > >>>>>> IMO, it's worth to note in man page. > >>>>>> > >>>>> Or the current patch for just ENOMEM is sufficient here and we just have > >>>>> to update the man page? > >>>> I think the "On success, process_madvise() returns the number of bytes > >>>> advised" behaviour sounds useful. But madvise() doesn't do that. > >>>> > >>>> RETURN VALUE > >>>> On success, madvise() returns zero. On error, it returns -1 and errno > >>>> is set to indicate the error. > >>>> > >>>> So why is it desirable in the case of process_madvise()? > >>> Since process_madvise deal with multiple ranges and could fail at one of > >>> them in the middle or pocessing, people could decide where the call > >>> failed and then make a strategy whether they will abort at the point or > >>> continue to hint next addresses. Here, problem of the strategy is API > >>> doesn't return any error vaule if it has processed any bytes so they > >>> would have limitation to decide a policy. That's the limitation for > >>> every vector IO syscalls, unfortunately. > >>> > >>>> > >>>> > >>>> And why was process_madvise() designed this way? Or was it > >>>> always simply an error in the manpage? > >> Taking a closer look, indeed manpage seems to be wrong. > >> https://elixir.bootlin.com/linux/v5.17-rc8/source/mm/madvise.c#L1154 > >> indicates that in the presence of unmapped holes madvise will skip > >> them but will return ENOMEM and that's what process_madvise is > >> ultimately returning in this case. So, the manpage claim of "This > >> return value may be less than the total number of requested bytes, if > >> an error occurred after some iovec elements were already processed." > >> does not reflect the reality in our case because the return value will > >> be -ENOMEM. After the desired behavior is finalized I'll modify the > >> manpage accordingly. > > Since process_madvise() might be used in sort of non-cooperative mode, > > I think that the caller cannot guarantee that it knows exactly the > > memory layout of the process whose memory it madvise’s. I know that > > MADV_DONTNEED for instance is not supported (at least today) by > > process_madvise(), but if it were, the caller may want which exact > > memory was madvise'd even if the target process ran some other > > memory layout changing syscalls (e.g., munmap()). > > > > IOW, skipping holes and just returning the total number of madvise’d > > bytes might not be enough. > > Then does the advised bytes range by default including holes is a > correct design? > Say the [start, len) range passed in the iovec by the user contains the > layout like, vma1 -- hole-- vma2 -- hole -- vma3. > > Under ideal case, where all vma's are eligible for advise, the total > bytes processed returning should be vma3->end - vma1->start. This is > success case. > > Now, say that vma1 is succeeded but vma2(say VM_LOCKED) is failed at > advise. In such case processed bytes will be > vma2->start-vma1->start(still consider hole as bytes processed), so that > user may restart/skip at vma2, then continue. This return type will be > partially processed bytes. > > If the system doesn't found any VMA in the passed range by user, it > returns ENOMEM as not a single advisable vma is found in the range. As I mentioned in other reply, let's do not make any exception(i.e., skipping hole) for vectored memory syscall but exact processed bytes on the exact ranges.