On Mon, Aug 19, 2013 at 8:33 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Mon, Aug 19, 2013 at 08:20:12PM -0700, Andy Lutomirski wrote: >> On Mon, Aug 19, 2013 at 7:28 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: >> > On Fri, Aug 16, 2013 at 04:22:09PM -0700, Andy Lutomirski wrote: >> >> This is like file_update_time, except that it acts on a struct inode * >> >> instead of a struct file *. >> >> >> >> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxxxxxx> >> >> --- >> >> fs/inode.c | 72 ++++++++++++++++++++++++++++++++++++++++++------------ >> >> include/linux/fs.h | 1 + >> >> 2 files changed, 58 insertions(+), 15 deletions(-) >> >> >> >> [...] >> >> >> + >> >> +int inode_update_time_writable(struct inode *inode) >> >> +{ >> >> + struct timespec now; >> >> + int sync_it = prepare_update_cmtime(inode, &now); >> >> + int ret; >> >> + >> >> + if (!sync_it) >> >> + return 0; >> >> + >> >> + /* sb_start_pagefault and update_time can both sleep. */ >> >> + sb_start_pagefault(inode->i_sb); >> >> + ret = update_time(inode, &now, sync_it); >> >> + sb_end_pagefault(inode->i_sb); >> > >> > This gets called from the writeback path - you can't use >> > sb_start_pagefault/sb_end_pagefault in that path. >> >> The race I'm worried about is: >> >> - mmap >> - write to the mapping >> - remount ro >> - flush_cmtime -> inode_update_time_writable > > sb_start_pagefault() is for filesystem freeze protection, not > remount-ro protection. If you freeze the filesystem, then we stop > writes and pagefaults by making sb_start_pagefault/sb_start_write > block, and then run writeback to clean all the pages. If writeback > then blocks on sb_start_pagefault(), we've got a deadlock. > >> This may be impossible, in which case I'm okay, but it's nice to have >> a sanity check. I'll see if I can figure out how to do that. > > The process of remount-ro should flush the dirty pages - the inode > and page has been marked dirty by page_mkwrite(), after all. Hmm. We can land in here from writeback, in which case the time should be updated unconditionally. We can also land in here from msync(MS_ASYNC) or munmap. munmap at least shouldn't block. The nasty case is if a page is dirtied, then the frozen level is set to SB_FREEZE_PAGEFAULT, and then userspace calls munmap or msync *before* writepages gets called. In this case, blocking until the fs is unfrozen is probably impolite, and returning without updating the time is questionable. Removing the check entirely may add a new race, though: what if .flush_cmtime has called mapping_test_clear_cmtime but hasn't gotten to updating the time yet when freezing finishes? This could be prevented by changing generic_flush_cmtime to do __sb_start_write(sb, SB_FREEZE_FS, false) and doing nothing if the fs is already frozen. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html