Dmitry Monakhov <dmonakhov@xxxxxxxxxx> writes: > After freeze_fs was revoked (from Jan Kara) pages's write-back completion > is deffered before unwritten conversion, so explicit flush_unwritten_io() > was removed here: c724585b62411 > But we still may face deferred conversion for aio-dio case > # Trivial testcase > for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done & > fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \ > --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite > NOTE: Sane testcase should be integrated to xfstests, but it requires > changes in common/* code, so let's use this this test at the moment. > > In order to fix this race we have to guard journal transaction with explicit > sb_{start,end}_intwrite() as we do with ext4_evict_inode here:8e8ad8a5 Fairly to say I'm not very happy with the fix because it continues bad practice of ad-hock fixes for generic journal vs freeze synchronization Ideal fix would be to move sb_start_intwrite/sb_end_intwrite() to ext4_journal_start()/ext4_journal_stop() but this is not possible due to limitations introduced by nojournal mode (described here:8e8ad8a5) So let's fix nojournal instead. In order to do that we somehow have store ref_count and pointer to sb inside nojournal_handle. There are two possible ways to do that. 1) Embed second journal related field to task_struct and guard it with compile macros definition. void *journal_info; + #ifdef CONFIG_EXTRA_JOURNAL_INFO + void *journal_info2; + #endif 2) Encode ref and sb in to single long. This can be done by aligning ext4_sb_info pointer to 4096. So we can embed ref count to lower bits like follows. #define EXT4_NOJOURNAL_SHIFT 12 #define EXT4_NOJOURNAL_MAX_REF_COUNT 1 << (EXT4_NOJOURNAL_SHIFT-1) #define EXT4_NOJOURNAL_MASK (1 << EXT4_NOJOURNAL_SHIFT) -1 #define NOJOURNAL_SB(handle) (handle & ~EXT4_NOJOURNAL_MASK) #define NOJOURNAL_REF(handle) ((handle & ~EXT4_NOJOURNAL_MASK) >> 1) static int ext4_handle_valid(handle_t *handle) { return !(handle & 0x1); } static handle_t *get_nojournal_handle(struct super_block *sb) { handle_t *handle = current->journal_info; struct super_block *old_sb = NOJOURNAL_SB(handle); unsigned long ref_cnt = NOJOURNAL_REF(handle); BUG_ON(old_sb && old_sb != sb); ref++; current->journal_info = NOJOURNAL_SB(handle); } What do you think about this? Are where any better way to fix this? > > Signed-off-by: Dmitry Monakhov <dmonakhov@xxxxxxxxxx> > --- > fs/ext4/extents.c | 7 +++++++ > 1 files changed, 7 insertions(+), 0 deletions(-) > > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c > index 3a6197a..4cba944 100644 > --- a/fs/ext4/extents.c > +++ b/fs/ext4/extents.c > @@ -5040,6 +5040,12 @@ int ext4_convert_unwritten_extents(handle_t *handle, struct inode *inode, > max_blocks = ((EXT4_BLOCK_ALIGN(len + offset, blkbits) >> blkbits) - > map.m_lblk); > /* > + * Protect us against freezing - AIO-DIO case. Caller didn't have to > + * have any protection against it > + */ > + sb_start_intwrite(inode->i_sb); > + > + /* > * This is somewhat ugly but the idea is clear: When transaction is > * reserved, everything goes into it. Otherwise we rather start several > * smaller transactions for conversion of each extent separately. > @@ -5083,6 +5089,7 @@ int ext4_convert_unwritten_extents(handle_t *handle, struct inode *inode, > } > if (!credits) > ret2 = ext4_journal_stop(handle); > + sb_end_intwrite(inode->i_sb); > return ret > 0 ? ret2 : ret; > } > > -- > 1.7.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
signature.asc
Description: PGP signature