Re: [PATCH v4 02/34] ext4: check the extent status again before inserting delalloc block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> writes:

> Zhang Yi <yi.zhang@xxxxxxxxxxxxxxx> writes:
>
>> From: Zhang Yi <yi.zhang@xxxxxxxxxx>
>>
>> Now we lookup extent status entry without holding the i_data_sem before
>> inserting delalloc block, it works fine in buffered write path and
>> because it holds i_rwsem and folio lock, and the mmap path holds folio
>> lock, so the found extent locklessly couldn't be modified concurrently.
>> But it could be raced by fallocate since it allocate block whitout
>> holding i_rwsem and folio lock.
>>
>> ext4_page_mkwrite()             ext4_fallocate()
>>  block_page_mkwrite()
>>   ext4_da_map_blocks()
>>    //find hole in extent status tree
>>                                  ext4_alloc_file_blocks()
>>                                   ext4_map_blocks()
>>                                    //allocate block and unwritten extent
>>    ext4_insert_delayed_block()
>>     ext4_da_reserve_space()
>>      //reserve one more block
>>     ext4_es_insert_delayed_block()
>>      //drop unwritten extent and add delayed extent by mistake
>>
>> Then, the delalloc extent is wrong until writeback, the one more
>> reserved block can't be release any more and trigger below warning:
>>
>>  EXT4-fs (pmem2): Inode 13 (00000000bbbd4d23): i_reserved_data_blocks(1) not cleared!
>>
>> Hold i_data_sem in write mode directly can fix the problem, but it's
>> expansive, we should keep the lockless check and check the extent again
>> once we need to add an new delalloc block.
>
> Hi Zhang, 
>
> It's a nice finding. I was wondering if this was caught in any of the
> xfstests?
>
> I have reworded some of the commit message, feel free to use it if you
> think this version is better. The use of which path uses which locks was
> a bit confusing in the original commit message.
>
> <reworded from your original commit msg>
>
> ext4_da_map_blocks(), first looks up the extent status tree for any
> extent entry with i_data_sem held in read mode. It then unlocks
> i_data_sem, if it can't find an entry and take this lock in write
> mode for inserting a new da entry.

Sorry about this above paragraph. I messed this paragraph.
Here is the correct version of this.

ext4_da_map_blocks looks up for any extent entry in the extent status
tree (w/o i_data_sem) and then the looks up for any ondisk extent
mapping (with i_data_sem in read mode).

If it finds a hole in the extent status tree or if it couldn't find any
entry at all, it then takes the i_data_sem in write mode to add a da entry
into the extent status tree. This can actually race with page mkwrite
& fallocate path. 

Note that this is ok between...  <and the rest can remain same>

>
> This is ok between -
> 1. ext4 buffered-write path v/s ext4_page_mkwrite(), because of the
> folio lock
> 2. ext4 buffered write path v/s ext4 fallocate because of the inode
> lock.
>


> But this can race between ext4_page_mkwrite() & ext4 fallocate path - 
>
>  ext4_page_mkwrite()             ext4_fallocate()
>   block_page_mkwrite()
>    ext4_da_map_blocks()
>     //find hole in extent status tree
>                                   ext4_alloc_file_blocks()
>                                    ext4_map_blocks()
>                                     //allocate block and unwritten extent
>     ext4_insert_delayed_block()
>      ext4_da_reserve_space()
>       //reserve one more block
>      ext4_es_insert_delayed_block()
>       //drop unwritten extent and add delayed extent by mistake
>
> Then, the delalloc extent is wrong until writeback and the extra
> reserved block can't be released any more and it triggers below warning:
>
>   EXT4-fs (pmem2): Inode 13 (00000000bbbd4d23): i_reserved_data_blocks(1) not cleared!
>
> This patch fixes the problem by looking up extent status tree again
> while the i_data_sem is held in write mode. If it still can't find
> any entry, then we insert a new da entry into the extent status tree.
>
>>
>> Cc: stable@xxxxxxxxxxxxxxx
>> Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx>
>> ---
>>  fs/ext4/inode.c | 19 +++++++++++++++++++
>>  1 file changed, 19 insertions(+)
>>
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 6a41172c06e1..118b0497a954 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -1737,6 +1737,7 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
>>  		if (ext4_es_is_hole(&es))
>>  			goto add_delayed;
>>  
>> +found:
>>  		/*
>>  		 * Delayed extent could be allocated by fallocate.
>>  		 * So we need to check it.
>> @@ -1781,6 +1782,24 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
>>  
>>  add_delayed:
>>  	down_write(&EXT4_I(inode)->i_data_sem);
>> +	/*
>> +	 * Lookup extents tree again under i_data_sem, make sure this
>> +	 * inserting delalloc range haven't been delayed or allocated
>> +	 * whitout holding i_rwsem and folio lock.
>> +	 */
>
> page fault path (ext4_page_mkwrite does not take i_rwsem) and fallocate
> path (no folio lock) can race. Make sure we lookup the extent status
> tree here again while i_data_sem is held in write mode, before inserting
> a new da entry in the extent status tree.
>
>


-ritesh




[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux