Re: [PATCH] mm: fix race between MADV_FREE reclaim and blkdev direct IO read

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 17, 2021 at 3:51 PM Yang Shi <shy828301@xxxxxxxxx> wrote:
>
> On Fri, Dec 10, 2021 at 6:22 PM Mauricio Faria de Oliveira
> <mfo@xxxxxxxxxxxxx> wrote:
...
> > MADV_FREE'd buffers:
> > ===================
> >
> > So, back to the "if MADV_FREE pages are used as buffers" note.
> > The case is arguable, and subject to multiple interpretations.
> >
> > The madvise(2) manual page on the MADV_FREE advice value says:
> > - 'After a successful MADV_FREE ... data will be lost when
> >    the kernel frees the pages.'
> > - 'the free operation will be canceled if the caller writes
> >    into the page' / 'subsequent writes ... will succeed and
> >    then [the] kernel cannot free those dirtied pages'
> > - 'If there is no subsequent write, the kernel can free the
> >    pages at any time.'
> >
> > Thoughts, questions, considerations...
> > - Since the kernel didn't actually free the page (page_ref_freeze()
> >   failed), should the data not have been lost? (on userspace read.)
> > - Should writes performed by the direct IO read be able to cancel
> >   the free operation?
> >   - Should the direct IO read be considered as 'the caller' too,
> >     as it's been requested by 'the caller'?
> >   - Should the bio technique to dirty pages on return to userspace
> >     (bio_check_pages_dirty() is called/used by __blkdev_direct_IO())
> >     be considered in another/special way here?
> > - Should an upcoming write from a previously requested direct IO
> >   read be considered as a subsequent write, so the kernel should
> >   not free the pages? (as it's known at the time of page reclaim.)
> >
> > Technically, the last point would seem a reasonable consideration
> > and balance, as the madvise(2) manual page apparently (and fairly)
> > seem to assume that 'writes' are memory access from the userspace
> > process (not explicitly considering writes from the kernel or its
> > corner cases; again, fairly).. plus the kernel fix implementation
> > for the corner case of the largely 'non-atomic write' encompassed
> > by a direct IO read operation, is relatively simple; and it helps.
...
> IIUC, you are expecting to get the old data after MADV_FREE? TBH, you
> should not expect so at all after MADV_FREE since those pages may get
> freed at any time.

Hey, thanks for checking this.

Correct; the discussion behind this is covered in the text above. It's indeed
arguable, but the fix makes the behavior more consistent for the case of a
direct IO read (rather than potentially returning zero-pages a bit randomly.)

cheers,

-- 
Mauricio Faria de Oliveira




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux