On Fri, Sep 23, 2022 at 12:10:12PM +1000, Dave Chinner wrote: > > Jason mentioned a scenario here: > > > > https://lore.kernel.org/all/YyuoE8BgImRXVkkO@xxxxxxxxxx/ > > > > Multi-thread process where thread1 does open(O_DIRECT)+mmap()+read() and > > thread2 does memunmap()+close() while the read() is inflight. > > And, ah, what production application does this and expects to be > able to process the result of the read() operation without getting a > SEGV? The read() will do GUP and get a pined page, next the memunmap()/close will release the inode the VMA was holding open. The read() FD is NOT a DAX FD. We are now UAFing the DAX storage. There is no SEGV. It is not about sane applications, it is about kernel security against hostile userspace. > i.e. The underlying problem here is that memunmap() frees the VMA > while there are still active task-based references to the pages in > that VMA. IOWs, the VMA should not be torn down until the O_DIRECT > read has released all the references to the pages mapped into the > task address space. This is Jan's suggestion, I think we are still far from being able to do that for O_DIRECT paths. Even if you fix the close() this way, doesn't truncate still have the same problem? At the end of the day the rule is a DAX page must not be re-used until its refcount is 0. At some point the FS should wait for. Jason