Re: [PATCH 2/7] ext4: completed_io locking cleanup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 10 Sep 2012 11:23:12 +0200, Jan Kara <jack@xxxxxxx> wrote:
> On Sun 09-09-12 21:27:09, Dmitry Monakhov wrote:
> > Current unwritten extent conversion state-machine is very fuzzy.
> > - By unknown reason it want perform conversion under i_mutex. What for?
> >   All this games with mutex_trylock result in following deadlock
> >    truncate:                          kworker:
> >     ext4_setattr                       ext4_end_io_work
> >     mutex_lock(i_mutex)
> >     inode_dio_wait(inode)  ->BLOCK
> >                              DEADLOCK<- mutex_trylock()
> >                                         inode_dio_done()
> >   #TEST_CASE1_BEGIN
> >   MNT=/mnt_scrach
> >   unlink $MNT/file
> >   fallocate -l $((1024*1024*1024)) $MNT/file
> >   aio-stress -I 100000 -O -s 100m -n -t 1 -c 10 -o 2 -o 3 $MNT/file
> >   sleep 2
> >   truncate -s 0 $MNT/file
> >   #TEST_CASE1_END
> > 
> > This patch makes state machine simple and clean:
> > 
> > (1) ext4_flush_completed_IO is responsible for handling all pending
> >     end_io from ei->i_completed_io_list(per inode list)
> >     NOTE1: i_completed_io_lock is acquired only once
> >     NOTE2: i_mutex is not required because it does not protect
> >            any data guarded by i_mutex
>   Removing of i_mutex might be OK but you really need to add more
> justification (both into the changelog and end_io code) than just "it does
> not protect any data guarded by i_mutex". So far i_mutex is supposed to
> protect all modifications of extent tree, now that won't be true anymore.
> We have i_data_sem protecting extent tree as well but that is acquired for
> single extent operation only so it cannot be used for guarding "global
> properties of the extent tree" - e.g. if end_io code would race with
> truncate it could happen that end_io would create some extents beyond EOF.
> Now maybe that cannot happen because of other synchronization mechanisms
> (e.g. inode_dio_wait(), or waiting for PageWriteback while invalidating
> page cache - although extra care needs to be taken when extents straddle
> i_size and there can be IO running on the part of the extent before
> i_size). So I don't immediaetly see a problem but please add more
> justification to convince me (and future readers of the code) here...
Ok you right. Actually I've already tried to figure out correct BUG_ON
condition to spot extent beyond EOF, but this condition is not trivial
and need justification. My current assumption:
  Since EXT4_GET_BLOCKS_UNINIT_EXT is used only then file size is not 
  changed it seems to be correct to prohibit
  ext4_convert_unwritten_extents() to handle blocks beyond
  EXT4_I(inode)->i_disksize (inode->i_size is not suitable here
  because truncate may already reduce it, and now wait for existing dio)

As a naive word in my defence i should say that blocks beyond EOF issue easily
triggered on current kernel via xfstests, but not happen w/ my patch-queue. 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux