On Fri, Aug 30, 2013 at 06:50:18PM +0400, Maxim Patlasov wrote: > Hi Miklos, > > 08/30/2013 02:12 PM, Miklos Szeredi пишет: > >On Fri, Aug 09, 2013 at 07:02:12PM +0400, Maxim Patlasov wrote: > >>08/06/2013 08:25 PM, Miklos Szeredi пишет: > >>>Hmm. Direct IO on an mmaped file will do get_user_pages() which will > >>>do the necessary page fault magic and ->page_mkwrite() will be called. > >>>At least AFAICS. > >>Yes, I agree. > >> > >>>The page cannot become dirty through a memory mapping without first > >>>switching the pte from read-only to read-write first. Page accounting > >>>logic relies on this too. The other way the page can become dirty is > >>>through write(2) on the fs. But we do get notified about that too. > >>Yes, that's correct, but I don't understand why you disregard two > >>other cases of marking page dirty (both related to direct AIO read > >>from a file to a memory region mmap-ed to a fuse file): > >> > >>1. dio_bio_submit() --> > >> bio_set_pages_dirty() --> > >> set_page_dirty_lock() > >> > >>2. dio_bio_complete() --> > >> bio_check_pages_dirty() --> > >> bio_dirty_fn() --> > >> bio_set_pages_dirty() --> > >> set_page_dirty_lock() > >> > >>As soon as a page became dirty through a memory mapping (exactly as > >>you explained), nothing would prevent it to be written-back. And > >>fuse will call end_page_writeback almost immediately after copying > >>the real page to a temporary one. Then dio_bio_submit may re-dirty > >>page speculatively w/o notifying fuse. And again, since then nothing > >>would prevent it to be written-back once more. Hence we can end up > >>in more then one temporary page in fuse write-back. And similar > >>concern for dio_bio_complete() re-dirty. > >> > >>This make me think that we do need fuse_page_is_writeback() in > >>fuse_writepages_fill(). But it shouldn't be harmful because it will > >>no-op practically always due to waiting for fuse writeback in > >>->page_mkwrite() and in course of handling write(2). > >The problem is: if we need it in ->writepages, we need it in ->writepage too. > >And that's where we can't have it because it would deadlock in reclaim. > > I thought we're protected from the deadlock by the following chunk > (in the very beginning of fuse_writepage): > > >+ if (fuse_page_is_writeback(inode, page->index)) { > >+ if (wbc->sync_mode != WB_SYNC_ALL) { > >+ redirty_page_for_writepage(wbc, page); > >+ return 0; > >+ } > >+ fuse_wait_on_page_writeback(inode, page->index); > >+ } > > Because reclaimer will never call us with WB_SYNC_ALL. Did I miss > something? Yeah, we could have that in ->writepage() too. And apparently that would work, reclaim would just leave us alone. Then there's sync(2) which does do WB_SYNC_ALL. Yet for an unprivileged fuse mount we don't want ->writepages() to block because that's a quite clear DoS issue. So we are left with this: > >There's a way to work around this: > > > > - if the request is still in queue, just update it with the contents of > > the new page > > > > - if the request already in userspace, create a new reqest, but only let > > userspace have it once the previous request for the same page > > completes, so the ordering is not messed up > > > >But that's a lot of hairy code. > > Is it exactly how NFS solves similar problem? NFS will apparently just block if there's a request outstanding and we are in WB_SYNC_ALL mode. Which is somewhat simpler. Thanks, Miklos -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>