On Thu 29-07-10 15:44:31, Ric Wheeler wrote: > On 07/28/2010 09:44 PM, Ted Ts'o wrote: > >On Wed, Jul 28, 2010 at 11:28:59AM +0200, Christoph Hellwig wrote: > >>If we move all filesystems to non-draining barriers with pre- and post- > >>flushes that might actually be a relatively easy first step. We don't > >>have the complications to deal with multiple types of barriers to > >>start with, and it'll fix the issue for devices without volatile write > >>caches completely. > >> > >>I just need some help from the filesystem folks to determine if they > >>are safe with them. > >> > >>I know for sure that ext3 and xfs are from looking through them. And > >>I know reiserfs is if we make sure it doesn't hit the code path that > >>relies on it that is currently enabled by the barrier option. > >> > >>I'll just need more feedback from ext4, gfs2, btrfs and nilfs folks. > >>That already ends our small list of barrier supporting filesystems, and > >>possibly ocfs2, too - although the barrier implementation there seems > >>incomplete as it doesn't seem to flush caches in fsync. > >Define "are safe" --- what interface we planning on using for the > >non-draining barrier? At least for ext3, when we write the commit > >record using set_buffer_ordered(bh), it assumes that this will do a > >flush of all previous writes and that the commit will hit the disk > >before any subsequent writes are sent to the disk. So turning the > >write of a buffer head marked with set_buffered_ordered() into a FUA > >write would _not_ be safe for ext3. > > I confess that I am a bit fuzzy on FUA, but think that it means that > any FUA tagged IO will go down to persistent store before returning. > > If so, then all order dependent IO would need to be issued in order > and tagged with FUA. It would not suffice to tag just the commit > record as FUA, or do I misunderstand what FUA does? Ric, I think you misunderstood it a bit. I think the proposal for ext3 was to write ordered data + metadata to the journal except for transaction commit block, then issue SYNCHRONIZE_CACHE and then write transaction commit block either with FUA bit set or without it and call SYNCHRONIZE_CACHE after that as well. The difference from the current behavior would be that we save the queue draining we do these days... Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html