On Fri 13-01-12 11:09:32, Dave Chinner wrote: > On Thu, Jan 12, 2012 at 12:30:31PM +0100, Jan Kara wrote: > > On Thu 12-01-12 13:48:41, Dave Chinner wrote: > > > On Thu, Jan 12, 2012 at 02:20:49AM +0100, Jan Kara wrote: > > > > > > > > Hello, > > > > > > > > filesystem freezing is currently racy and thus we can end up with dirty data > > > > on frozen filesystem (see changelog of the first patch for detailed race > > > > description and proposed fix). This patch series aims at fixing this. > > > > > > It only fixes the dirty data race (i.e. SB_FREEZE_WRITE). The same > > > race conditions exist for SB_FREEZE_TRANS on XFS, and so need the > > > same fix. That race has had one previous attempt at fixing it in > > > XFS but that's not possible: > > > > > > b2ce397 Revert "xfs: fix filesystsem freeze race in xfs_trans_alloc" > > > 7a249cf xfs: fix filesystsem freeze race in xfs_trans_alloc > > > > > > It was looking at that problem earlier today that lead to the > > > solution Eric proposed. Essentially the method in these patches > > > needs to replace the xfs specifc m_active_trans counter and delay > > > during ->fs_freeze to prevent that race condition.... > > OK, I see. I just checked ext4 to make sure and ext4 seems to get this > > right. Looking into Christoph's original patch it shouldn't be hard to fix > > it. Instead of: > > atomic_inc(&mp->m_active_trans); > > > > if (wait_for_freeze) > > xfs_wait_for_freeze(mp, SB_FREEZE_TRANS); > > > > we just need to do a bit more elaborate > > > > retry: > > if (wait_for_freeze) > > xfs_wait_for_freeze(mp, SB_FREEZE_TRANS); > > atomic_inc(&mp->m_active_trans); > > if (wait_for_freeze && mp->m_super->s_frozen >= SB_FREEZE_TRANS) { > > atomic_dec(&mp->m_active_trans); > > goto retry; > > } > > > > Or does XFS support nested transactions (i.e. a thread already holding a > > running transaction can call into xfs_trans_alloc() again)? > > That would make things more complicated... > > You're still missing the point - that this isn't an XFS specific > problem or that the write problem is a ext4 specific problem. The > problem is that these are freeze state transition problems - > something that can affect every filesystem because the freeze code > is generic. Quite frankly, I'm not interested in having a generic > solution for SB_FREEZE_WRITE and a custom, per filesystem solution > for SB_FREEZE_TRANS when the solution is exactly the same. I understand that both state transitions are currently racy. Just ext3, ext4, reiserfs, gfs2, or btrfs do not really care about SB_FREEZE_TRANS transition because they all grew their own synchronization mechanisms for that. XFS is the only filesystem I know of which really relies on this transition. That's why I originally decided to fixup SB_FREEZE_TRANS transition only in XFS and not in VFS. But on a second thought I tend to agree with you that VFS should provide a way to do race-free transition to both states so that filesystems that want to use it can use it. So I'll add a second counter for that. > > Using sb_start_write() instead of m_active_trans won't be that easy because > > it can create A-A deadlocks (e.g. we do sb_start_write in > > block_page_mkwrite() and then xfs_get_blocks() decides to start a > > transaction and calls sb_start_write() again which might block if > > filesystem freezing started in the mean time). > > So, like Eric said in his first email, it's not a "write start/end" > interface that is needed, the interface has to work with different > freeze levels (e.g "sb_freeze_ref(sb, level)/sb_freeze_drop(sb, > level)"). Sure, internally it would have to map to two counters and > different level checks, but it solves the same problem for all > levels of freeze for all filesystems. > > Let's fix this freeze problem once and for all in the generic code, > and not have to keep coming back to it to add more functioanlity for > different situations the most recent fix didn't handle for random > filesystem X.... Yeah. I think ext3/4 could be converted to the generic mechanism (although it won't be completely trivial since it uses the internal mechanism also for other things than filesystem freezing). Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html