Re: [PATCH v2] madvise: MADV_SOFT_OFFLINE requests can return -EBUSY

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Luis, Tyonnchie,

On Fri, Nov 29, 2024 at 06:43:39PM -0300, Luis Claudio R. Goncalves wrote:
> On Thu, Nov 28, 2024 at 12:35:48PM +0100, Alejandro Colomar wrote:
> > Hi Tyonnchie,
> > 
> > On Tue, Nov 26, 2024 at 11:12:03AM -0500, tyberry@xxxxxxxxxx wrote:
> > > If the page could not be offlined madvise will return -EBUSY. This might occur if the page is currently in use or locked.
> > 
> > Could you show this in a small example program (if possible)?
> > Like 30 lines or so.  If not, it's okay.
> 
> Hi Alejandro!
> 
> Given the ongoing holidays, let me take the liberty of giving some context
> in order to keep the conversation going.
> 
> We received reports of failed LTP madvise11[1] tests. The errors looked
> like this:
> 
>     madvise11.c:409: TINFO: Spawning 4 threads, with a total of 640 memory pages
>     madvise11.c:132: TFAIL: madvise failed: EBUSY (16)
>     madvise11.c:163: TINFO: Thread  [0]  returned 16, failed.
>     madvise11.c:191: TFAIL: thread  [0]  - exited with errors
>     madvise11.c:163: TINFO: Thread  [2]  returned 0, succeeded.
>     madvise11.c:163: TINFO: Thread  [3]  returned 0, succeeded.
>     madvise11.c:163: TINFO: Thread  [1]  returned 0, succeeded.
>     madvise11.c:361: TINFO: Restore 629 Soft-offlined pages
>     madvise11.c:290: TWARN: write(3,0x7ffce114b8a0,8) failed: EBUSY (16)
> 
> Clearly the problem had to do with -EBUSY being returned by a madvise()
> operation. The bug was initially reported on kernels with PREEMPT_RT
> enabled but we soon observed that the problem also happened with the stock
> kernel, though requiring more repetitions to trigger issue.
> 
> After debug and investigation we observed that the -EBUSY return was a valid
> case in the kernel code and was not being handled by the test. A fix was
> sent to the LTP project by Li Wang[2], specifically for the madvise11 test.
> 
> In this process, we noticed that the man pages did not mention -EBUSY as a
> possible result of a failed offlining operation, as described by Tyonnchie.
> 
> I hope this helps!

Thanks!  I've applied the patch, with some tweaks:
<https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=3205359a3a7079d9d40a50388e851874729a827a>

I added an Acked-by on your behalf, Luis.


Have a lovely night!
Alex

> 
> Best regards,
> Luis
> 
> [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise11.c
> [2] https://lists.linux.it/pipermail/ltp/2024-May/038310.html
> 
> 
> > Have a lovely day!
> > Alex
> > 
> > > 
> > > Signed-off-by: Tyonnchie Berry <tyberry@xxxxxxxxxx>
> > > 
> > > ---
> > > 
> > > diff --git a/man/man2/madvise.2 b/man/man2/madvise.2
> > > index 4f2210ee2..c10dcd599 100644
> > > --- a/man/man2/madvise.2
> > > +++ b/man/man2/madvise.2
> > > @@ -702,6 +702,13 @@ The map exists, but the area maps something that isn't a file.
> > >  .BR MADV_COLLAPSE )
> > >  Could not charge hugepage to cgroup: cgroup limit exceeded.
> > >  .TP
> > > +.B EBUSY
> > > +(for
> > > +.B MADV_SOFT_OFFLINE )
> > > +If any pages within the add+length range could not be offlined,
> > > +madvise will return -EBUSY.
> > > +This might occur if the page is currently in use or locked.
> > > +.TP
> > >  .B EFAULT
> > >  .I advice
> > >  is
> > > 
> > 
> > -- 
> > <https://www.alejandro-colomar.es/>
> 
> 
> ---end quoted text---



-- 
<https://www.alejandro-colomar.es/>

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux