On Fri, Dec 17, 2021 at 3:51 PM Yang Shi <shy828301@xxxxxxxxx> wrote: > > On Fri, Dec 10, 2021 at 6:22 PM Mauricio Faria de Oliveira > <mfo@xxxxxxxxxxxxx> wrote: ... > > MADV_FREE'd buffers: > > =================== > > > > So, back to the "if MADV_FREE pages are used as buffers" note. > > The case is arguable, and subject to multiple interpretations. > > > > The madvise(2) manual page on the MADV_FREE advice value says: > > - 'After a successful MADV_FREE ... data will be lost when > > the kernel frees the pages.' > > - 'the free operation will be canceled if the caller writes > > into the page' / 'subsequent writes ... will succeed and > > then [the] kernel cannot free those dirtied pages' > > - 'If there is no subsequent write, the kernel can free the > > pages at any time.' > > > > Thoughts, questions, considerations... > > - Since the kernel didn't actually free the page (page_ref_freeze() > > failed), should the data not have been lost? (on userspace read.) > > - Should writes performed by the direct IO read be able to cancel > > the free operation? > > - Should the direct IO read be considered as 'the caller' too, > > as it's been requested by 'the caller'? > > - Should the bio technique to dirty pages on return to userspace > > (bio_check_pages_dirty() is called/used by __blkdev_direct_IO()) > > be considered in another/special way here? > > - Should an upcoming write from a previously requested direct IO > > read be considered as a subsequent write, so the kernel should > > not free the pages? (as it's known at the time of page reclaim.) > > > > Technically, the last point would seem a reasonable consideration > > and balance, as the madvise(2) manual page apparently (and fairly) > > seem to assume that 'writes' are memory access from the userspace > > process (not explicitly considering writes from the kernel or its > > corner cases; again, fairly).. plus the kernel fix implementation > > for the corner case of the largely 'non-atomic write' encompassed > > by a direct IO read operation, is relatively simple; and it helps. ... > IIUC, you are expecting to get the old data after MADV_FREE? TBH, you > should not expect so at all after MADV_FREE since those pages may get > freed at any time. Hey, thanks for checking this. Correct; the discussion behind this is covered in the text above. It's indeed arguable, but the fix makes the behavior more consistent for the case of a direct IO read (rather than potentially returning zero-pages a bit randomly.) cheers, -- Mauricio Faria de Oliveira