On Wed, Sep 4, 2013 at 12:20 PM, Jan Kara <jack@xxxxxxx> wrote: > On Wed 04-09-13 10:54:50, Andy Lutomirski wrote: >> >> @@ -1970,6 +1988,39 @@ int write_one_page(struct page *page, int wait) >> >> } >> >> EXPORT_SYMBOL(write_one_page); >> >> >> >> +void mapping_flush_cmtime(struct address_space *mapping) >> >> +{ >> >> + if (mapping_test_clear_cmtime(mapping) && >> >> + mapping->a_ops->update_cmtime_deferred) >> >> + mapping->a_ops->update_cmtime_deferred(mapping); >> >> +} >> >> +EXPORT_SYMBOL(mapping_flush_cmtime); >> > Hum, is there a reason for update_cmtime_deferred() operation? I can >> > hardly imagine anyone will want to do anything else than what >> > inode_update_time_writable() does so why bother? You mention tmpfs & co. >> > don't fit into your scheme well with which I agree so let's just keep >> > file_update_time() in their page_mkwrite() operation. But I don't see a >> > real need for avoiding the deferred cmtime logic... >> >> I think there might be odd corner cases. For example, mmap a tmpfs >> file, write it, and unmap it. Then, an hour later, maybe the system > If you unmap it then that will handle the update. But if you won't unmap, > you'd get spurious updates of timestamps which would be strange. > >> will be under memory pressure and page out the file. This could >> trigger a surprising time update. (I'm not sure this can actually >> happen on tmpfs, but maybe it would on some other filesystem.) >> >> Does this actually matter? A flag to turn the feature on or off would >> do the trick, but I don't think there's precedent for sticking a flag >> in a_ops. > Flag in a_ops is ugly. But you can have a flag in 'struct > filesystem_type' which would be reasonable. OK, will do. > >> >> +void mapping_flush_cmtime_nowb(struct address_space *mapping) >> >> +{ >> >> + /* >> >> + * We get called from munmap and msync. Both calls can race >> >> + * with fs freezing. If the fs is frozen after >> >> + * mapping_test_clear_cmtime but before the time update, then >> >> + * sync_filesystem will miss the cmtime update (because we >> >> + * just cleared it) and we don't be able to write (because the >> >> + * fs is frozen). On the other hand, we can't just return if >> >> + * we're in the SB_FREEZE_PAGEFAULT state because our caller >> >> + * expects the timestamp to be synchronously updated. So we >> >> + * get write access without blocking, at the SB_FREEZE_FS >> >> + * level. If the fs is already fully frozen, then we already >> >> + * know we have nothing to do. >> >> + */ >> >> + >> >> + if (!mapping_test_cmtime(mapping)) >> >> + return; /* Optimization: nothing to do. */ >> >> + >> >> + if (__sb_start_write(mapping->host->i_sb, SB_FREEZE_FS, false)) { >> >> + mapping_flush_cmtime(mapping); >> >> + __sb_end_write(mapping->host->i_sb, SB_FREEZE_FS); >> >> + } >> >> +} >> > This is wrong because SB_FREEZE_FS level is targetted for filesystem >> > internal use. Also it is racy. mapping_flush_cmtime() ends up calling >> > mark_inode_dirty() and filesystems such as ext4 or xfs will start a >> > transaction to store inode in the journal. This gets freeze protection at >> > SB_FREEZE_FS level again. If freeze_super() sets s_writers.frozen to >> > SB_FREEZE_FS before this second protection, things will deadlock. >> >> Whoops -- I assumed that it was safe to recursively take freeze >> protection at the same level. >> >> I'm worried about the following race: >> >> Thread 1 (in munmap): >> Check AS_CMTIME set >> sb_start_pagefault >> >> Thread 2 (freezing the fs): >> frozen = SB_FREEZE_PAGEFAULT; >> sync_filesystem() >> >> Thread 1 is now stuck. It doesn't need to be, because sync_filesystem >> will flush out the cmtime write. But there doesn't seem to be a clean >> mechanism to wait for the freeze to finish. > OK, I see. Frankly, I'd rather live with msync() and munmap() blocking > while filesystem is frozen than trying to outsmart the freezing logic... > If someone comes up with a usecase where it causes trouble, we can always > improve the logic with some clever tricks. I'll at least check that it's a shared writable mapping before doing the flush to avoid blocking on other types of munmap. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html