Re: [PATCH 2/2] mm: madvise: return exact bytes advised with process_madvise under error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu 24-03-22 21:15:57, Charan Teja Kalla wrote:
> Thanks Michal for the inputs.
> 
> On 3/24/2022 6:44 PM, Michal Hocko wrote:
> > On Wed 23-03-22 20:54:10, Charan Teja Kalla wrote:
> >> From: Charan Teja Reddy <quic_charante@xxxxxxxxxxx>
> >>
> >> The commit 5bd009c7c9a9 ("mm: madvise: return correct bytes advised with
> >> process_madvise") fixes the issue to return number of bytes that are
> >> successfully advised before hitting error with iovec elements
> >> processing. But, when the user passed unmapped ranges in iovec, the
> >> syscall ignores these holes and continues processing and returns ENOMEM
> >> in the end, which is same as madvise semantic. This is a problem for
> >> vector processing where user may want to know how many bytes were
> >> exactly processed in a iovec element to make better decissions in the
> >> user space. As in ENOMEM case, we processed all bytes in a iovec element
> >> but still returned error which will confuse the user whether it is
> >> failed or succeeded to advise.
> > Do you have any specific example where the initial semantic is really
> > problematic or is this mostly a theoretical problem you have found when
> > reading the code?
> > 
> > 
> >> As an example, consider below ranges were passed by the user in struct
> >> iovec: iovec1(ranges: vma1), iovec2(ranges: vma2 -- vma3 -- hole) and
> >> iovec3(ranges: vma4). In the current implementation, it fully advise
> >> iovec1 and iovec2 but just returns number of processed bytes as iovec1
> >> range. Then user may repeat the processing of iovec2, which is already
> >> processed, which then returns with ENOMEM. Then user may want to skip
> >> iovec2 and starts processing from iovec3. Here because of wrong return
> >> processed bytes, iovec2 is processed twice.
> > I think you should be much more specific why this is actually a problem.
> > This would surely be less optimal but is this a correctness issue?
> > 
> 
> Yes, this is a problem found when reading the code, but IMO we can
> easily expect an invalid vma/hole in the passed range because we are
> operating on other process VMA. More than solving the problem of being
> less optimal, this can be looked in the direction of helping the user to
> take better policy decisions with this syscall. And, not better policy
> decisions from user is just being sub optimal(i.e. issuing the syscall
> again on the processed range) with this syscall.
> 
> Having said that, at present I don't have any reports/unit test showing
> the existing semantic is really a problematic.

OK, thanks for the clarification. I would tend to not change the
existing semantic. For one doing so is always a regression risk so the
reasoning should be really strong.
[...]
> > but so it sounds the problem you are trying to fix IMHO. I think it
> > would be better to live with imprecise return values reporting rather
> > than aiming for perfection which would be fragile and add a future
> > maintenance burden.
> >
> Hmm. Should atleast this imprecise return values be documented in man
> page or in madvise.c file?

The man page says:
"
On success, process_madvise() returns the number of bytes
advised.  This return value may be less than the total number of
requested bytes, if an error occurred after some iovec elements
were already processed.  The caller should check the return value
to determine whether a partial advice occurred.
"

which is pretty broad and AFAIU it matches the current behavior. It
doesn't explain what exactly the return value is. It just mentions that
the caller should check for partial advice without any further guidance
- e.g. where should a new call start. I think that such a guidance would
be a bad in general. On a partial success the caller would need to
re-evaluate ranges anyway.

So I guess we are good on the man page side for now.
-- 
Michal Hocko
SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux