On Thu, Nov 22, 2012 at 08:47:13AM +1100, NeilBrown wrote: > On Wed, 21 Nov 2012 22:33:33 +0100 Jan Kara <jack@xxxxxxx> wrote: > > > On Wed 21-11-12 13:13:19, Darrick J. Wong wrote: > > > On Wed, Nov 21, 2012 at 03:15:43AM +0100, Jan Kara wrote: > > > > On Tue 20-11-12 18:00:56, Darrick J. Wong wrote: > > > > > ext3 doesn't properly isolate pages from changes during writeback. Since the > > > > > recommended fix is to use ext4, for now we'll just print a warning if the user > > > > > tries to mount in write mode. > > > > > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > > --- > > > > > fs/ext3/super.c | 8 ++++++++ > > > > > 1 file changed, 8 insertions(+) > > > > > > > > > > > > > > > diff --git a/fs/ext3/super.c b/fs/ext3/super.c > > > > > index 5366393..5b3725d 100644 > > > > > --- a/fs/ext3/super.c > > > > > +++ b/fs/ext3/super.c > > > > > @@ -1325,6 +1325,14 @@ static int ext3_setup_super(struct super_block *sb, struct ext3_super_block *es, > > > > > "forcing read-only mode"); > > > > > res = MS_RDONLY; > > > > > } > > > > > + if (!read_only && > > > > > + queue_requires_stable_pages(bdev_get_queue(sb->s_bdev))) { > > > > > + ext3_msg(sb, KERN_ERR, > > > > > + "error: ext3 cannot safely write data to a disk " > > > > > + "requiring stable pages writes; forcing read-only " > > > > > + "mode. Upgrading to ext4 is recommended."); > > > > > + res = MS_RDONLY; > > > > > + } > > > > > if (read_only) > > > > > return res; > > > > > if (!(sbi->s_mount_state & EXT3_VALID_FS)) > > > > Why this? ext3 should be fixed by your change to > > > > filemap_page_mkwrite()... Or does testing show otherwise? > > > > > > Yes, it's still broken even with this new set of changes. Now that I think > > > about it a little more, I recall that writeback mode was actually fine, so this > > > is a little harsh. > > > > > > Hm... looking at the ordered code a little more, it looks like > > > ext3_ordered_write_end is calling journal_dirty_data_fn, which (I guess?) tries > > > to write mapped buffers back through the journal? Taking it out seems to fix > > > ordered mode, though I have a suspicion that it might very well break ordered > > > mode too. > > Oh, right. kjournald writing buffers directly (without setting > > PageWriteback) will break things. So please, change warning to: Maybe we should just fix this anyway? I still have the patch that adds PG_stable (and changes the wait_for_page_stable() test to use this flag instead of PG_writeback) kicking around in my tree. I wrote a patch to jbd that changes journal_do_submit_data to set PG_stable, call clear_page_dirty_for_io(), and unsets the stable bit in the end_io processing. It seems to get rid of the checksum-after-write errors, though I'm not convinced it's correct. But, I'll send both patches along. > > > > /* > > * In data=ordered mode, kjournald writes buffers without setting > > * PageWriteback bit thus generic code does not properly wait for > > * writeback of those buffers to finish. > > */ > > if (!read_only && > > test_opt(sb, DATA_FLAGS) == EXT3_MOUNT_ORDERED_DATA && test_opt(sb, DATA_FLAGS) != EXT3_MOUNT_WRITEBACK_DATA since I bet data=journal mode is also borken wrt PageWriteback. > > queue_requires_stable_pages(bdev_get_queue(sb->s_bdev))) { > > ext3_msg(sb, KERN_ERR, > > "error: data=ordered mode does not support stable " > > "page writes required by the disk; forcing read-only " > > "mode. Upgrading to ext4 is recommended."); > > res = MS_RDONLY; > > } > > > > then you need a similar check in ext3_remount() so that filesystem cannot > > be remounted read-write. > > > > Honza > > Given this restriction, there is no way that I can change md/raid5 to set the > "stable pages" flag and stop copying pages into the stripe-cache. ext3 on > raid5 will be much too common to allow this breakage. > > I would really like to be able to say "I prefer stable pages, but they aren't > a requirement as long as I know which is which" .... I'd rather just fix ext3. :) (or remove it, since ext4 can handle ext3) --D > NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html