Re: [URGENT PATCH] ext4: fix potential deadlock in ext4_evict_inode()

Tao Ma <tm@xxxxxx> · Fri, 26 Aug 2011 17:27:39 +0800

On 08/26/2011 05:24 PM, Dave Chinner wrote:
> On Fri, Aug 26, 2011 at 05:03:14PM +0800, Tao Ma wrote:
>> On 08/26/2011 04:44 PM, Dave Chinner wrote:
>>> On Fri, Aug 26, 2011 at 05:35:07PM +1000, Dave Chinner wrote:
>>>> On Thu, Aug 25, 2011 at 11:33:44PM -0400, Theodore Ts'o wrote:
>>>>>
>>>>> Note: this will probably need to be sent to Linus as an emergency
>>>>> bugfix ASAP, since it was introduced in 3.1-rc1, so it represents a
>>>>> regression.
>>>>
>>>> It doesn't appear to be a bug. All of the new ext4 lockdep reports
>>>> in 3.1 I've seen (except for the mmap_sem/i_mutex one) are false
>>>> positives....
>>>
>>> While the lockdep report is false positive, I agree that your
>>> change is the right fix to make - the IO completions are already
>>> queued on the workqueue, so they don't need to be flushed to get
>>> them to complete. All that needs to be done is call
>>> ext4_ioend_wait() for them to complete, and that gets rid of the
>>> i_mutex altogether. (*)
>> ext4_ioend_wait can't work here for a nasty bug. Please see the commit
>> log of 2581fdc8.
> 
> Unless I'm missing something, the described race with
> ext4_truncate() flushing completions without the i_mutex lock held
> cannot occur if you've already waited for all pending completions to
> drain by calling ext4_ioend_wait()....
No, it doesn't mean the ext4_truncate. But another race pasted below.

Flush inode's i_completed_io_list before calling ext4_io_wait to
prevent the following deadlock scenario: A page fault happens while
some process is writing inode A. During page fault,
shrink_icache_memory is called that in turn evicts another inode
B. Inode B has some pending io_end work so it calls ext4_ioend_wait()
that waits for inode B's i_ioend_count to become zero. However, inode
B's ioend work was queued behind some of inode A's ioend work on the
same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten
thread on that cpu is processing inode A's ioend work, it tries to
grab inode A's i_mutex lock. Since the i_mutex lock of inode A is
still hold before the page fault happened, we enter a deadlock.

Thanks
Tao
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html