Re: [PATCH v2] mm: fix race between MADV_FREE reclaim and blkdev direct IO read

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 11, 2022 at 12:21:59PM -0800, Minchan Kim wrote:
> On Tue, Jan 11, 2022 at 12:20:13PM -0800, Minchan Kim wrote:
> < snip >
> > > > slow path with __gup_longterm_unlocked and set_dirty_pages
> > > > for them).
> > > > 
> > > > This approach would solve other cases where map userspace
> > > > pages into kernel space and then write. Since the write
> > > > didn't go through with the process's page table, we will
> > > > lose the dirty bit in the page table of the process and
> > > > it turns out same problem. That's why I'd like to approach
> > > > this.
> > > > 
> > > > If it doesn't work, the other option to fix this specific
> > > > case is can't we make pages dirty in advance in DIO read-case?
> > > > 
> > > > When I look at DIO code, it's already doing in async case.
> > > > Could't we do the same thing for the other cases?
> > > > I guess the worst case we will see would be more page
> > > > writeback since the page becomes dirty unnecessary.
> > > 
> > > Marking pages dirty after pinning them is a pre-existing area of
> > > problems. See the long-running LWN articles about get_user_pages() [1].
> > 
> > Oh, Do you mean marking page dirty in DIO path is already problems?
> 
>                   ^ marking page dirty too late in DIO path
> 
> Typo fix.

I looked though the articles but couldn't find dots to connetct
issues with this MADV_FREE issue. However, man page shows a clue
why it's fine.

```
       O_DIRECT  I/Os should never be run concurrently with the fork(2) system call, if the memory buffer is a private map‐
       ping (i.e., any mapping created with the mmap(2) MAP_PRIVATE flag; this includes memory allocated on  the  heap  and
       statically  allocated  buffers).  Any such I/Os, whether submitted via an asynchronous I/O interface or from another
       thread in the process, should be completed before fork(2) is called.  Failure to do so can result in data corruption
       and  undefined  behavior  in parent and child processes.

```

I think it would make the copy_present_pte's page_dup_rmap safe.



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux